Method for identifying or detecting genomic rearrangements in a biological sample

ABSTRACT

A method for detection, visualization and/or comparison of polynucleotide sequences of interest using specially designed sets of long and short probes that enhance resolution and simplify visualization and detection. Probe compositions useful for practicing this method and procedures for identifying useful probes and probe combinations. These methods are useful for the detection of genomic rearrangements, especially those associated with various diseases, disorders and conditions including cancer or for assessment of genomic rearrangements associated with therapy. The probe compositions may be used in kits for detection of genetic rearrangements or in companion diagnostic products or kits, such as kits for the diagnosis or assessment of predisposition to cancer such as colorectal cancer.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. Ser. No. 14/816,397,filed Aug. 3, 2015, which is a continuation of U.S. Ser. 13/665,440,filed Oct. 31, 2012, which claims priority to U.S. ProvisionalApplication No. 61/553,889, filed Oct. 31, 2011, the entire contents ofwhich are incorporated herein by reference. On Oct. 30, 2012,International Application PCT/IB/2012/002423 was also filed with thesame title, the entire contents of which are incorporated herein byreference.

BACKGROUND OF THE INVENTION Field of the Invention

The invention relates to high-resolution, precise method for detectinggenomic rearrangements iii vitro using specially designed combinationsof polynucleotide probes. The invention concerns accurate methods ofdetection and diagnosis of conditions, disorders and diseases associatedwith rearrangement of genomic DNA.

Description of the Related Art The Multigenic Paradigm of Human Diseases

Advances in genetic analysis of human diseases have provided betterinsights into the molecular mechanisms contributing to diseaseinitiation and progression. Previous associations were made betweenparticular diseases and association and/or linkage disequilibrium tosingle base mutations in somatic genetic sequences or with particularsingle nucleotide polymorphisms (“SNPs”) in genomic DNA. Newertechnologies have provided evidence that larger genetic alterations andrearrangements are associated with, or can constitute major causes ofdiseases, disorders or conditions having a genetic origin or basis.Disease associations have now moved from a monogenic to a multigenicparadigm where a disease's origins and progression is mainly linked tomore than one single genetic mutation or origin. While these newinsights provide better avenues for disease detection and treatments,they also highlight the need for combinatorial genetic analysis thatgoes beyond detection of single mutational events or SNPs by assessingdisease associations with larger genomic rearrangements. Suchcombinatorial genetic analysis would provide a better, more precise andaccurate diagnosis of a particular condition, disorder, disease orpathology, but would also help establishing a more appropriate medicalsurvey, more accurate therapeutic decisions and interventions, as wellas help in assessing the efficacy of such therapies and interventions.

Multigenic Causes of Genetic Disease

Genetic disorders manifesting the same or similar clinical signs andconsequences can arise from both single and exclusive, or combined,mutations in various genes. Such mutations can fall within either thesingle base alteration and/or the class of large genetic rearrangements.A few examples of such genetic disorders are Fragile X syndrome(imitations and expansions in the FMR1 gene), Ataxia Telangectasia(single base pair mutations in either intronic and exonic sequences aswell as deletions and translocations of the ATM gene), Seckel syndrome(mutations as well as large rearrangements in SCKL1, SCKL2, SCKL3, PCTNand ATR). autism (mutations as well as large rearrangements in GLO1,MTF1 and SLC11A3), Spinal Muscular Atrophy (mutations, deletions,transconversions as well as cis-duplications involving the SMN1 and SMN2genes) and myotonic dystrophy (trinucleotide/tetranucleotide expansionsin DM1 and DM2).

Multigenic Causes of Cancer Predisposition

In the case of cancer predisposition, there are several examples offamilial cancer predisposition syndromes for which one can nominateseveral causative genes for which both single base alterations and/orlarge rearrangements were identified.

Breast and Ovary Cancer. Causative genes: BRCA1, BRCA2, ATM . . .mutation type: higher proportion of point mutations identified so far.

Hereditary nonpolyposis colorectal cancer (Lynch syndroma). Causativegenes: MSH2, MLH1, MSH6, EPCAM, . . . mutation type: equivalentproportion of point mutations has also been identified.

Multigenic Causes of Cancer Progression

Cancer progression is surely the human disease domain where themonogenic causative hypothesis was definitely ruled out since severalyears. First, the disease's initiation is strictly dependent of twomolecular events (immortalizing and transforming) due to geneticalterations in at least two independent genes classified at eitheroncogene or tumor suppressor genes. Second, the disease's progression islinked to additional genetic alterations independent from the causativeones. Not only do these additional alterations play a role in cancerprogression, they also were demonstrated to be the basis for appearanceof resistance to therapy during treatments. Strikingly, in the list ofcancer related genes, if extremely rare examples are only subject todiscrete single base mutations (e.g., KRas or BRaf), the large majorityis either subject to only large rearrangements (e.g., HER2, ALK . . . )or to both single base mutations and large rearrangements (p53, c-myc,c-Met, EGFR . . . ).

The identification and characterization of multigenic conditions,disorders and diseases, including cancer, cardiovascular disease,diabetes and other heritable genetic conditions has been made difficultin part due to the imprecision of existing methods of moleculardiagnosis. Molecular Combing is probably the sole approach allowingdetecting all type of large genetic rearrangements (deletion,amplification, expansions, inversions, translocations . . . ) even in acomplex and heterogeneous population (such as tumors).

High resolution barcodes allowing multiplex analysis of patients couldhelp diagnostic at different level such as for patientstratification/classification and/or prognosis.

Multiplex High Resolution Barcodes for Identifying the Right GeneticAlterations as a Key Driver for Therapeutic Intervention The Example ofMyotonic Dystrophy

Myotonic Dystrophy (DM1) and Myotonic Dystrophy 2 (DM2) are two musculardystrophies characterized by trinucleotide/tetranucleotide expansions intwo different genes. If severe forms of DM1 can be clinicallydifferentiated from DM2, milder DM1 forms are displayed extremelysimilar clinical signs than DM2. There is currently no cure for ortreatment specific to myotonic dystrophy. However, DM1 patients exhibitComplications of the disease (heart problems, cataracts . . . ) notexisting in DM2 that could can be treated but not cured. DifferentiatingDM1 and DM2 by the use of a multiplex assay of high resolution barcodescould thus help preventing and treating secondary effects

The Example of Hereditary Breast and Ovary Cancer

In certain countries (U.S.) detecting constitutional alterations inBRCA1/2 drives to therapeutic intervention (surgery/reconstitution).Thus, there is a clear need for an accurate diagnostic comprising allthe potentially involved genes. Such a test could be made on the basisof a multiplex assay of high resolution barcodes comprising largechromosomal regions around genes known to be involved in this syndrome;BRCA1, BRCA2, ATM, ATR . . .

DNA Damage and Response Inhibitors Example

Synthetic lethality became a strong reality for therapeutic decision toinclude Cancer patients in specific protocols/regimens. One of the firstexamples was given with the demonstration that Breast cancer patientswith BRCA deficiency exhibit a higher sensitivity to PARP inhibitors, anew category of drug acting on DNA Damage and Response pathway. Morerecently, this was extended to other type of inhibitors in this categorysuch as ATM inhibitors but also to more traditional anti-cancer drugsincluding all types of DNA polymerase and replication inhibitors.

Not only does this concept extended to other inhibitors, but it was alsodemonstrated that it could be extended to other types of cancers such aslung and metastatic melanoma.

Here, a multiplex high resolution barcode will allow detection ofgenetic alteration in genes involved in DNA damage and response thatcould help predicting sensitivity to this class of inhibitors. A list ofsuch genes could include BRCA1, BRCA2, ATM, ATR, MSH2, MLH1, MSH6, EPCAM. . .

The Lung Cancer Example

Numerous alterations involved in lung cancer could be multiplexed for abetter patient classification such as:

-   LOH/Deletion (P53, STK11, LKB1, BRG1, KLF6);-   Amplification (FGFRI, MET, EGFR HER2 . . . );-   Translocation: (ALK); All these genetic alteration are associated to    therapeutic treatments:-   P53: Nutlin (low doses Actinomycin D produce similar effects)-   FGFR1: Masitinib, PD173074, SU5402 TK1258 AZD4547 . . .-   MET: GSK1363089, ARQ197, SGX523, XL184 . . .-   EGFR: Tarceva, Erbitux, Vectibix . . .-   HER2: Herceptin, Lapatinib . . .-   ALK: Crizotinib

As at least 30% of NSCLCs were demonstrated to be dependent on at leastone of these mutations, defining the genetic profile of the tumor couldhelp driving therapeutic options. This could be made possible bydesigning multiplex assays combining high resolution barcodes coveringthis major genetic loci.

Localization of (Genetic) Sequences of Interest

Genetic sequence is the most fundamental information to synthesizefunctional protein. Alteration of genetic sequence sometimes results inloss of functional protein synthesis. In addition to alteration ofgenetic sequence, loss or gain of genetic sequence (copy numbervariation, CNV) also can be problematic for homeostasis of cellularactivity. For example, loss of (functional) anti-tumor protein (p53) orgain of proto-oncogene (c-myc) results in cancer-prone cell. When suchmutation happens (or exists) in germ cell, this mutation spreads wholecell in an individual who is either carrier or patient of geneticdisease, or has a predisposition to cancer. The germline mutation can beheritable. These days CNV becomes more and more important to understandin the field of genetics (ref 1). However, copy number count alone isnot always sufficient and it is often critical to establish the actuallocation of sequence elements. This is strikingly the case for e.g.balanced translocations. DNA sequencing and CNV detection methods suchas array-based comparative genomic hybridization (aCGH) and quantitativePCR generally cannot detect these balanced mutations because thesemethods assess whether the sequence and the copy number are correct ornot. FISH and its extended forms such as fiber-FISH or molecular combingcan address these balanced mutations with different resolutions andprecisions depending on methods.

Resolution and Precision

The use of BAC/PAC/cosmid probes on targeted regions was successfullyconducted to detect large (a few kb to tens of kb) genomicrearrangements (ref 2). In these approaches, the minimum size ofdetectable events (e.g., the size of the deleted or amplified sequence),hereafter designated as the “resolution” of such an assay, is limiteddue to the large standard deviation involved in measuring probes or gapsof tens of kilobases. Indeed, in such assays the standard deviation ofmeasurements increases with the length of the measured element. Forexample, a 40 kb-probe is measured with a standard deviation of ˜5 kb.Thus, if 16 measurements of a given probe are made on a slide, theprecision on the size of the probe obtained as the mean value ofmeasurements is in the order of magnitude of 2.5 kb (Considering thedistribution is gaussian, and the precision is the half-width of theconfidence interval, i.e. 2.sd/√n where sd=standard deviation andn=number of measurements). For a 10 kb-probe, where the standarddeviation is ˜2 kb, the precision would be ˜1 kb. This illustrates thefact that shorter probes allow for better (lower) resolution.

Besides, the location of such an event (the position of the extremitiesof the event) may be defined with a precision (hereafter the locationprecision) limited by the size of the probe or gap within which itoccurs: e.g. if a 40 kb probe is estimated to measure 39 kb in a sample,one can conclude that a 1 kb deletion occurred somewhere within theprobe, with no further precision—thus, somewhere in a 40 kb genomicregion. If the same 1 kb deletion had occurred within a 10 kb probe, thelocation of that deletion would be known with a better precision, as therange would be reduced to a 10 kb genomic region. Therefore, the smallerthe probes and gaps, the better the location precision.

There are limits to small probes: (i) below a certain size, they becomedifficult to detect; (ii) they involve more complex color schemes (asthere are relatively more probes); (iii) there are more distinct probesto cover a given region, and the experiments are therefore moreexpensive and time-consuming; (iv) most importantly, fast and reliableidentification of probes, whether by a human operator or a piece ofsoftware, is easier with longer probes, as they are more readilydistinguished from background. Indeed, background is mainly constitutedof roughly circular fluorescent spots. When large enough, the shape ofthese spots allows to one to easily distinguish them from probes.However, when their size is small enough, they appear difficult todistinguish from small probes.

In operating conditions according to the invention, probes shorter than˜3 kb are detected with a diminished efficiency. Within the 3-10 kbrange, the standard deviation of measurements varies little, and thereis therefore little benefit in resolution with the shorter probes withinthis range. Therefore, this range is usually considered to be a goodcompromise for probe size. However, in cases where probes are closeenough (less than 10 kb gaps), smaller probes (within the 500-3000 bprange) are still useful, as they will be detected in at least a fractionof signals and the presence of the corresponding sequences may thereforebe established with certainty. It was also found that detection ofisolated probes longer than 12 kb (preferably longer than 14 kb) is morereliable, whether for a human operator or for automatic detectionsoftware.

Exclusion of Repeats

Eukaryotic genomic DNA contains various repetitive sequences, i.e.,sequences that appear more than once (and more than statisticallypredicted based on their length and base content) in a normal haploidgenome. Among these, some appear with very high frequency (tens ofthousands to millions of copies). In human genomic DNA, the mostabundant of these is the Alu family, which has ˜1,000,000 copiesconstituting ˜10% of the genome. In any hybridization procedureinvolving human genomic DNA, it is expected that probes carrying suchrepeats would hybridize on numerous targets, generating non-specificsignal from regions throughout the genome. Other types of repetitivesequences exist, with lower frequency, and often more specificlocalization. The number of copies and repeat sequence length may varywidely, as well as the degree of homology. Beta-satellite sequences, forexample, are present in multiple copies (hundreds to thousands), usuallyas tandem repeat arrays comprising hundreds of copies of the same 50-100bp long sequence, specifically localized in a limited number of loci.Strategies to get rid of the non-specific signals depend on the type ofprocedure and probe. Schematically, when probes are very short sequencesof DNA (oligonucleotides, typically less than 100 bp), as in aCGHprocedures, the sequence of the oligonucleotides is chosen to be free ofrepetitive sequences, by comparison with repetitive sequences found indatabases. This strategy is only practical for very short probes, asshort sequences free of repetitive sequences are relatively abundant,but unpractical for longer probes, as long stretches completely devoidof repetitive elements are rare (although this has been adapted tolonger FISH probes, in an approach that suffers multiple drawbacks, seebelow). Besides, even for short probes, it constrains the design ofprobes heavily and some genomic regions, rich in repetitive sequences,have lower density of coverage (and thus lower resolution of events) dueto this constraint.

When probes are longer (typically PCR products or cloned DNA inserts—1to 150 kb), in Southern Blot or in FISH procedures, non-labeledcompetitive DNA, enriched in repetitive elements such as Alu repeats(usually Cot-1 DNA), is added in large excess along with the labeledprobe. Competition of unlabelled probes on the repetitive sequencesminimizes the hybridization of labeled probes. This strategy isexpensive and since the competitor DNA is not purely made of repetitivesequences, competition also occurs on the unique sequences for which theprobes were designed, thus limiting the amount of competitor DNA thatmay be used. Therefore, the efficiency of this approach is limited.

An alternative approach for longer probes has been proposed by Knoll andcollaborators (U.S. Pat. No. 7,014,997), resembling the strategy usuallyadopted for oligonucleotides: probes are chosen within sequenceintervals devoid from repetitive elements. This strategy is based onbioinformatics analysis of the regions of interest and exclusion ofknown repetitive sequences by comparison with sequence databases.However, this approach has several limitations: prior knowledge of therepetitive sequences is required, which can be a problem e.g. in specieswhere such knowledge is unavailable. More importantly, intervals longerthan 2 kb devoid of repetitive sequences appear only once in 20-30 kb onaverage and are unevenly distributed(Considering the distribution isgaussian, and the precision is the half-width of the confidenceinterval, i.e. 2.sd/√n where sd=standard deviation and n=number o) sothe design of probes would be highly constrained, impairing thepossibility to design a high-resolution code. This would proveespecially difficult in repeat-rich regions, and/or regions wherepseudogenes are located next to homologous genes of interest—suchlow-copy repetitive sequences being also excluded with the strategy fromKnoll and co (ref. 3). Since regions targeted in rearrangement tests,e.g., for diagnostics purposes, often display these features, thisapproach is not suitable for the design of high-resolution barcodes andespecially not if such a code is to be used for diagnostics purposes.Distinctions between this approach and the invention are disclosed inmore detail below.

BRIEF SUMMARY OF THE INVENTION

The present invention concerns the field of the in vitro diagnosis anddetection of genetic rearrangements and is related to a method toidentify or detect genetic rearrangements in a biological sample to betested which are already known or which are new and provide markers forexample of diseases as cancers or metabolic or foetal genetic diseases.The invention is characterized by using compositions containing purifiedor synthesized nucleic acid molecules (polynucleotides) havingnucleotide sequences selected as short sequences with a length of lessthan 10 Kb and associated in the said method with other differentnucleic acid molecules (polynucleotides) having nucleotide sequencesnon-overlapping with the former ones and having a size longer than 12Kb. The selected nucleotide sequences (polynucleotides) used as probesare partly deleted of their natural frequently repeated sequences. Thepresent invention concerns also improvements brought to the design ofset of probe sequences for the detection of genetic rearrangements byhybridization as with fiber-FISH-like technologies such as MolecularCombing. The improvements described herein allow for highprecision/high-resolution detection of rearrangements in time- andcost-efficient assays. This invention also relates to the use of probesequences for diagnostics applications and companion diagnostics tests,to a method of detection of presence or absence of alterations insequences and to a kit for the above uses. This is illustratedhereinafter with sets of nucleotide sequences corresponding to parts ofat least two genes: MSH2 and MLH1 or to the regions of MSH2 and MLH1,whose mutations increase the risk of occurrence of human colorectalcancer

The invention is related to the sets of polynucleotides or probeslabeled or not which are specific of said genes. Presently, thedetection of genetic rearrangements using current technologies is ofteninsufficiently reliable for diagnostics use. Unlike most technologiesused to detect genetic alterations, which suffer strong intrinsiclimitations towards some types of rearrangements, direct technologiessuch as FISH or Fiber-FISH can intrinsically detect any type ofrearrangements. Their use is mainly limited by their resolution.Molecular Combing, on the other hand, may reach sufficient resolution,but probe designs currently used fail to allow cost- and time-efficienthigh resolution analysis of rearrangements.

These improvements involve the combination within the same sets ofprobes of -typically shorter—probes designed to optimize the sensitivedetection and precise measurement of rearrangements and—typicallylonger—probes to allow for fast and reliable detection of signals ofinterest when analyzing results. Alternative designs where the longerprobes are replace with a combination of shorter probes havingequivalent functions and effects are also disclosed.

Specific aspects of the invention based on the concept of combiningsmall probes for resolution and long probes for ease of detection forthe detection on one or more genomic region(s) of interest as disclosedin more detail below.

The invention thus concerns a method for detecting mutated or rearrangedgenomic polynucleotide (target) sequence comprising:

(a1) hybridizing a target genomic polynucleotide comprising one or moregenomic region(s) of interest, where mutations or rearrangements aresought, to a set of short probes that bind to each region of interestwithout long gaps between the portions of the target sequence bound bythe set of short probes, where on each genomic region a subset of shortprobes are selected so that when taken together they form a longcontiguous stretch inside or outside the region of interest, and whereinthe probes may optionally have frequent repetitive sequences removed andthus more generally are optionally devoid of such repetitive sequences;or

(a2) hybridizing a target genomic polynucleotide comprising one or moregenomic region(s) of interest, where mutations or rearrangements aresought, to a set of short probes that bind to each region of interestwithout long gaps between the portions of the target sequence bound bythe set of short probes and to one or more long (docking) probe(s) thatbind to sequences near but outside of the region(s) of interest; whereinthe sequence(s) of the long probe(s) does not overlap that of the shortprobes and wherein the short and/or long probes may optionally havefrequent repetitive sequences removed and thus more generally areoptionally devoid of such repetitive sequences;

(b) detecting the locations of hybridized probes on the genomicregion(s) of interest; optionally,

(c) comparing the location of the hybridized probes on the targetgenomic polynucleotide sequence with one or more motifs based on thehybridization of said probes to a reference, control, normal, notmutated, or not rearranged genomic polynucleotide sequence; andoptionally,

(d) correlating the presence of a mutated or rearranged genomicpolynucleotide with a specific phenotype, disease, disorder, orcondition.

The mutated or arranged genomic polynucleotide sequence can be obtainedfrom a subject who has cancer or who is suspected to having cancer, forexample, from a subject who has colorectal cancer or who is suspected ofhaving colorectal cancer. In such a case, the short and long probesidentify mutations or genomic rearrangements associated with colorectalcancer and a control or reference sample would not contain thesemutations or rearrangements. The presence or risk of developingcolorectal cancer is assessed by comparing a target genomicpolynucleotide sequence with the reference and determining whether amutation or rearrangement associated with colorectal cancer is present.This method can be practiced with specific probes corresponding to orderived from Probe sets 1, 2, 3 and 4. For colorectal cancer, a genomicregion of interest can be selected from genes associated with thisdisease, such as MSH2, MLH1, MSH6, PMS2 or EPCAM.

Similarly, the method may be applied to samples obtained from subjectshaving or at risk of developing other kinds of cancer, such as breastcancer, ovary cancer, or lung cancer. The method may also be applied tosamples obtained from subjects having or at risk of other kinds ofdiseases, disorders, or conditions, including cardiovascular disease,diabetes, neuromuscular disorders; such as myotonic dystrophy or spinalmuscular atrophy or samples obtained from a subject who has, issuspected of having, or is suspected of being a carrier for a genetic orhereditary disease, disorder or condition, including known or unknownfoetal genetic alterations. The sample can be obtained from a subjecthaving a multigenic genetic or hereditary disease, disorder or conditionor for a genetic or hereditary disease, disorder or condition associatedwith rearrangement of genomic DNA.

In some aspects of the invention, the sample will be obtained from asubject undergoing treatment for a disease, disorder or conditionassociated with a genomic or somatic genetic rearrangement and theresults obtained are compared to results obtained at other time pointsbefore, during or after the termination of treatment. A companion testfor evaluating the efficiency of a therapeutic drug on the mutated orrearranged nucleotide sequences of the gene or the region of the gene ofinterest can be performed using the short and long probes according tothe invention.

Preferably, in the method described above, the hybridizing with theshort and long probes in step a) will be performed simultaneously.

Preferably, the short probes range in length from 0.5 kb to 10 kb andthe maximum size of the gaps between the short probes when they arebound to the target is 15 kb, preferably 12 kb and more preferably 10kb.

The number of short probes employed in the method described above canrange from 1, 2, 3 to 10, 15 or more.

The maximum size for the long probes is 150 kb and these probespreferably range from 12 kb to 40 kb in length. Preferably, in order tohave “long probe(s) that hind to sequences near but outside of theregion of interest”, distance between the long probes and the region ofinterest is no longer than 150 kb, and more preferably no longer than 75kb and even more preferably no longer than 25 kb from the region ofinterest. The minimum size for a genomic region to be tested or targetedis 50 kb. The minimum number of regions of interest is one for asingleplex test and two or more for a multiplex test. Examples ofcombinations of short and/or long probes include at least one short(less than 10 kb) sequence and at least one non-overlapping longsequence (more than 15 kb), or at least one group of at least two shortsequences, less than 10 kb each, which total group length is longer than14 kb and less than 150 kb, hybridizing continuously on the mutated orrearranged polynucleotide sequence. The short probes can comprise a setof contiguous probes that span a stretch of the genomic polynucleotidesequences inside or outside the region of interest that is at least 15kb.

The long probes may have repetitive DNA sequences excluded. Theserepetitive sequences to be excluded would ordinarily appear more thanonce and more often than statistically predicted based on their lengthand base content, for example, repetitive sequences between 50 and 400bp can be excluded, though shorter or longer repetitive sequences thatdecrease sensitivity or specificity of the method can be identified andexcluded. An example of such a sequence is the repetitive Alu family DNAsequences.

According to an embodiment of the invention, in order for the probes,either short probes or long probes, to have repetitive sequencesexcluded, these probes are designed to hybridize in regions of thegenome which are free of such repetitive sequences, i.e. which have lessthan 10% preferably less than 2% of the selected type(s) of repetititvesequences to be excluded.

In the method described above, the short and long probes are preferablyfluorescently tagged and different components of the probe sets may betagged with different labels, such as labels with different colors.Tagging provides one means to identify motifs or submotifscharacteristic of a mutated or rearranged sequence.

Compositions or kits comprising a set of short probes or a combinationof short and long probes as described herein and optionally one or morecomponents for binding said probes to a polynucleotide, for performingmolecular combing, and/or for detecting whether hybridization hasoccurred are also contemplated. For example, a composition containingthe short and long probe(s) described above, wherein at least two ofsaid probe sequences detect a genetic rearrangement by using MolecularCombing, said composition comprising either at least one short (<12 kb)sequence and at least one non-overlapping long sequence (>14 kb), or atleast one group of at least two short sequences, less than 10 kb each,which total length is longer than 14 kb and less than 150 kb,hybridizing contiguously on the genetic target. The short probe(s) insuch a composition may preferably range from 0.5 kb to 12 kb and thelong probe(s) range from 14 kb to 40 kb. Frequent repetitive sequencesdescribed above may be removed from the probes. Examples of probesequences are those that hybridize specifically on the MSH2 gene or inthe region of the MSH2 gene or on the MLH1 gene or in the region of theMLH1 gene. Specific kinds of short probe sequence(s) where repetitivesequences have been removed include those selected from the groupconsisting of or comprising the sequences obtained by PCR amplificationon human genomic DNA using the primer pairs described in Table 1 in thelines:

MSH2-v1

P3 (primer pairs P3a_MSH2-v1 to P3c_MSH2-v1, SEQ ID NO:21-26)

P4 (primer pairs P4a_MSH2-v1 to P4b_MSH2-v1, SEQ ID NO:27-30)

P5 (primer pairs P5a_MSH2-v1 to P5c_MSH2-v1, SEQ ID NO:31-36)P6 (primerpairs P6a_MSH2-v1 to P6b_MSH2-v1, SEQ ID NO:37-40)

P7 (primer pairs P7a_MSH2-v1 to P7c_MSH2-v1, SEQ ID NO:41-46)

P8 (primer pairs P8a_MSH2-v1 to P8b_MSH2-v1, SEQ ID NO:47-50)

P9 (primer pairs P9a_MSH2-v1 to P9c_MSH2-v1, SEQ ID NO:51-56)

P10 (primer pairs P10a_MSH2-v1 to P10b_MSH2-v1, SEQ ID NO:57-60)

MLH1-v1

P3 (primer pairs P3a_MLH1-v1 to P3d_MLH1-v1, SEQ ID NO:95-102)

P4 (primer pairs P4a_MLH1-v1 to P4b_MLH1-v1, SEQ ID NO:103-106)

P5 (primer pairs P5a_MLH1-v1 to P5b_MLH1-v1, SEQ ID NO:107-110)

P6 (primer pair P6a_MLH1-v1, SEQ ID NO:111-112)

P7 (primer pair P7a_MLH1-v1, SEQ ID NO:113-114

P8 (primer pairs P8a_MLH1-v1 to P8d_MLH1-v1, SEQ ID NO:115-122)

and the short probes may be used in combination with the long probesequence(s) selected from the group consisting of or comprising thesequences obtained by PCR amplification on human genomic DNA using theprimer pairs described in Table 1 in the lines

MSH2-v1

P11 (primer pairs P11a_MSH2-v1 to P11c_MSH2-v1, SEQ ID NO:61-66)

P12 (primer pairs P12a_MSH2-v1 to P12e_MSH2-v1, SEQ ID NO:67-76)

MLH1-v1

P9 (primer pairs P9a_MLH1-v1 to P9c_SEQ ID NO:123-128)

P10 (primer pairs P10a_MLH1-v⁻1 to P10e_MLH1-v1, SEQ ID NO:129-138),

Specific kinds of contiguous short probe sequence(s) forming longstretches include those selected from the group consisting of orcomprising the sequences obtained by PCR amplification on human genomicDNA using the primer pairs described in Table 1 in the lines:

MSH2-v2

PE1-2 (primer pairs PE1_MSH2-v2 to PE2_MSH2-v2, SEQ ID NO:163-166) and

PE3-6 (primer pairs PE3_MSH2-v2 to PE5-6_MSH2-v2, SEQ ID NO:167-172),together forming one stretch;

PE9 (primer pairs E9_MSH2-v2 and I9-10_MSH2-v2, SEQ ID NO:185-188),

PE10 (primer pair E10_MSH2-v2, SEQ ID NO:189-190),

PE11 (primer pairs E11_MSH2-v2 and I11-12_MSH2-v2, SEQ ID NO:191-194),

PE12-14 (primer pairs E12_MSH2-v2 and E13-14_MSH2-v2, SEQ ID NO:195-198)and

PE15-16 (primer pairs E15_MSH2-v2 and E16_MSH2-v2, SEQ ID NO:199-202),together forming one stretch;

MLH1-v2

PE1-2 (primer pairs E1_MLH1-v2 and E2_MLH1-v2, SEQ ID NO:227-230),

PE3-4 (primer pairs I23_MLH1-v2, E3_MLH1-v2 and E4_MLH1-v2, SEQ IDNO:231-236),

PE5-6 (primer pairs E5_MLH1-v2 and E6_MLH1-v2, SEQ ID NO:237-240),

PE7-9 (primer pairs E7-8_MILH1-v2 and E9_MLH1-v2, SEQ ID NO:241-244) and

PE10-11 (primer pairs E10_MLH1-v2 and E11_MLH1-v2, SEQ ID NO:245-248),together forming one stretch;

The primers designed for the purpose of preparing short probes of theinvention may have a sequence of 20 to 40 nucleotides and comprise intheir 3′ end a sequence of at least 20 contiguous nucleotides that basepairs with the target. The primer sequence thus may also compriseadditional nucleotides that do not base pair with the target in its 5′end. The nucleotides which do not base pair may be useful for theconstruction of the primers or for the cloning of the amplified sequenceresulting from polymerization starting from the primers. In a particularembodiment the sequence of the primer that hybridizes to the target islonger than 20 nucleotides.Molecular Combing is a powerful FISH-based technique for directvisualization of single DNA molecules that are attached, uniformly andirreversibly, to specially treated glass surfaces (Herrick and Bensimon.2009); (Schurra and Bensimon, 2009). This technology considerablyimproves the structural and functional analysis of DNA across the genomeand is capable of visualizing the entire genome at high resolution (inthe kb range) in a single analysis.Another embodiment of the invention is a method for designing a set ofshort probes or set of short and long probes as described abovecomprising:

identifying a polynucleotide containing a genomic region of interest,

selecting long probe sequences outside of the genomic region of interestbut within 100 kb of the closest probe in the region of interest, andpreferably within 30 kb of the closest probe in the region of interestand optionally removing frequently repeated sequences from said longprobe sequences,

selecting a short probe sequences from within the genomic region ofinterest so that no gaps longer than 20 kb, and preferably no gapslonger than 12 kb appear between the short probes; or selecting a seriesof short probes that together form a long continuous stretch that coversthe genomic region of interest;

hybridizing the probes to a genomic polynucleotide comprising thegenomic region of interest,

detecting the hybridized probes, and

determining which sets of probes form motifs that specifically identifythe genomic sequence of interest from a reference genomic sequence.

The comparison of the location of the hybridized probes on the targetgenomic polynucleotide sequence with one or more motifs based on thehybridization of said probes to a reference, control, normal, notmutated, or not rearranged genomic polynucleotide sequence, as disclosedin the databanks or experimentally obtained on samples.

The techniques disclosed herein may be applied to diagnosis of diseaseas well as for the identification of genetic rearrangements associatedwith a disease, disorder or condition. They may also be used ascompanion diagnostics to study the responses of a subject or group ofsubjects who have particular rearrangements to therapy, responses toenvironmental agents, or the effects of lifestyle choices. Specifically,the diagnostic products and methods of the invention are useful fordiagnosis and assessments for subjects having or at risk of developingcolorectal cancer. High resolution barcodes allow multiplex analysis ofpatients for extended or expanded diagnosis at the levels of patientstratification/classification and prognosis. Thus, the techniquesdisclosed herein can also be used to predict the course and probablyoutcome of a disease, disorder or condition as well as the likelihood ofprogression, stability, or recovery. Multiplex high resolution barcodesalso permit the identification of key genetic alterations in a subjectthat would benefit from a particular kind of therapy as well as a way toassess the reaction of a subject to a particular kind of therapy ortherapeutic intervention. Specific embodiments of the invention includethe following, which embodiments are especially carried out in vitro.

A method for detecting mutated or rearranged genomic polynucleotidesequence comprising: (a1) hybridizing a target genomic polynucleotidecomprising one or more genomic region(s) of interest, where mutations orrearrangements are sought, to a set of short probes that bind to eachregion of interest without long gaps between the portions of the targetsequence bound by the set of short probes said set of short probesoptionally including or being in combination with a (sub)set of shortprobes selected so that on each genomic region some of the short probeswhen taken together form a long contiguous stretch inside or outside theregion of interest and where the short probes may optionally havefrequent repetitive sequences removed; or (a2) hybridizing a targetgenomic polynucleotide comprising one or more genomic region(s) ofinterest, where mutations or rearrangements are sought, to a set ofshort probes that bind to each region of interest without long gapsbetween the portions of the target sequence bound by the set of shortprobes and to one or more long (docking) probe(s) that bind to sequencesnear but outside of the region(s) of interest; wherein the sequence(s)of the long probe(s) does not overlap that of the short probes andwherein the short and/or long probes may optionally have some or all ofthe frequently repeating sequences removed; (b) detecting the locationsof hybridized probes on the genomic region(s) of interest; optionally,(c) comparing the location of the hybridized probes on the targetgenomic polynucleotide sequence with one or more motifs based on thehybridization of said probes to a reference, control, normal, notmutated, or not rearranged genomic polynucleotide)sequence; andoptionally, and/or (d) correlating the presence of a mutated orrearranged genomic polynucleotide with a specific phenotype, disease,disorder, or condition.

The invention relates in particular to the method herein describedwherein the mutated or rearranged genomic polynucleotide sequence isobtained from a subject who has cancer or who is suspected of havingcancer or who is susceptible to have a genetic predisposition to cancer.

The invention also relates in a particular embodiment to a methodwherein the mutated or rearranged genomic polynucleotide sequence isobtained from a subject who has colorectal cancer or who is suspected ofhaving colorectal cancer or who is susceptible to have a geneticpredisposition to colorectal cancer, wherein said short and long probesidentify mutations or genomic rearrangements associated with colorectalcancer, wherein said control, not mutated or genomic sequence isobtained from a subject not at risk for colorectal cancer and whereinthe detection of a genomic rearrangement; and assessing presence of orrisk of developing colorectal cancer when said genomic rearrangement isdetected. In this method the probes can hybridize specifically on theMSH2 gene, in the region of the MSH2 gene, on the MLH1 gene, or in theregion of the MLH1 gene.

The invention also relates in a particular embodiment to a methodwherein the mutated or rearranged genomic polynucleotide sequence isobtained from a subject who has breast cancer or who is suspected tohaving breast cancer or who is susceptible to have a geneticpredisposition to breast cancer.

The invention also relates in a particular embodiment to a methodwherein the mutated or rearranged genomic polynucleotide sequence isobtained from a subject who has ovarian cancer or who is suspected tohaving ovarian cancer or who is susceptible to have a geneticpredisposition to ovarian cancer.

The invention also relates in a particular embodiment to a methodwherein the mutated or rearranged genomic polynucleotide sequence isobtained from a subject who has lung cancer or who is suspected tohaving lung cancer or who is susceptible to have a geneticpredisposition to lung cancer.

The invention also relates in a particular embodiment to a methodwherein the mutated or rearranged genomic polynucleotide sequence isobtained from a subject who has a cardiovascular disease, disorder orcondition or who is suspected of having cardiovascular disease, disorderor condition or who is susceptible to have a genetic predisposition tocardiovascular disease, disorder or condition.

The invention also relates in a particular embodiment to a methodwherein the mutated or rearranged genomic polynucleotide sequence isobtained from a subject who has a diabetes or who is suspected of havingdiabetes or who is susceptible to have a genetic predisposition todiabetes.

The invention also relates in a particular embodiment to a methodwherein the mutated or rearranged genomic polynucleotide sequence isobtained from a subject who has a neuromuscular disorder or who issuspected of having a neuromuscular disorder.

The invention also relates in a particular embodiment to a methodwherein the mutated or rearranged genomic polynucleotide sequence isobtained from a subject who has, is suspected of having, or issusceptible of being a carrier for a genetic or hereditary disease,disorder or condition.

The invention also relates in a particular embodiment to a methodwherein the short and long probe sequences are specific to human genesor to human genomic regions associated with cancer, colorectal cancer ora foetal genetic alteration known or unknown when said region or gene ismutated or genetically rearranged.

The invention also relates in a particular embodiment to a methodwherein the mutated or rearranged genomic polynucleotide sequence isobtained from a subject who has, is suspected of having, or is suspectedof being a carrier for a multigenic genetic or hereditary disease,disorder or condition or for a genetic or hereditary disease, disorderor condition associated with rearrangement of genomic DNA.

The invention also relates in a particular embodiment to a methodwherein the mutated or rearranged genomic polynucleotide sequence isobtained from a subject undergoing treatment for a disease, disorder orcondition associated with a genomic inherited or acquired rearrangementand the results obtained are compared to results obtained at other timepoints before, during or after the termination of treatment.

The invention relates to method of any of the embodiments describedherein, characterized by the following features taken individually or inany combination: the hybridizing with the short and long probes in (a2)is performed simultaneously; the short probes are 10 kb or less; and/orthe short probe(s) comprise at least one short (less than 10 kb)sequence and at least one non-overlapping long sequence (more than 12kb), or at least one group of at least two short sequences, less than 5,6, 7, 8, 9 or 10 kb each, total group length is longer than 12 kb andless than 150 kb, hybridizing continuously on the mutated or rearrangedpolynucleotide sequence. In these methods the short probes may comprisea set of contiguous probes that span a stretch of the genomicpolynucleotide sequences inside or outside the region of interest thatis at least 14 kb; and/or the long probe(s) may comprise one or moredocking probes of more than 14 kb and less than 40 kb. The long probe(s)may have a length of at least 14 kb and bind to a polynucleotidesequence outside the region of interest.

Both the long and short probes may be designed to exclude frequentlyoccurring repetitive DNA sequences. These repetitive DNA sequences,which may be excluded from the long and short probes, will generallyappear more than once and more often than statistically predicted basedon their length and base content. For example, a repetitive DNA sequencebetween 50 and 400 contiguous nucleotides in length, which appear morethan once and more often than statistically predicted based on theirlength and base content, can be excluded from the short and/or longprobe(s). One example of a repetitive sequence that can be excluded fromthe short and long probes is or are members of the repetitive Alu familyDNA sequences.

In some embodiments of the invention the probes in (b) of the firstembodiment are fluorescently tagged so that they can be detectedfluorometrically. In other embodiments in b) each probe is tagged withone of two or more fluorescent tags.

According to other embodiments of the methods above, motifs or easilyidentifiable subsets of the probes are detected and compared instead ofevery probe sequence.

The methods described above may employ at least 3, 4, 5, 6, 7, 8, 9, 10,11, 12 or more short probes. These short probes may each have a lengthof least 500, 600, 700, 800, 900 or more base pairs (bp). In someembodiments of the methods above, the probes will be selected so thatthe gaps between short probes in the genomic region of interest are nomore than 12 kb each. In further embodiments the short probes will bindto a single contiguous genomic region of interest or the short probescan be selected to bind to more than one non-contiguous genomic regionof interest. The long probes used in the method above may be selected soas to be no more than 20, 30 or 40 kb. The or each of the genomicregion(s) of interest in the methods described above can be selected tobe longer than 50 kb.

Another embodiment of the invention is a kit comprising a set of shortprobes or a set of short and a set of long probe(s); and optionally oneor more components for binding said probes to a polynucleotide, forperforming molecular combing, and/or for detecting whether hybridizationhas occurred; (i) wherein the short probes comprise a set of probes thattaken together bind to a long continuous stretch of the genomic regionof interest; or(ii) wherein the long probes bind to sequences outsidethe genomic region of interest, do not overlap the short probesequences; and optionally, where the repetitive sequences have beenremoved from the long and/or short probes. A kit of the invention issuitable and/or is specific for use in a method of the invention asdisclosed herein. In a particular embodiment its short and/or longprobes are characterized by the features described herein in relationwith the methods. Such a kit may be employed for or contain instructionsfor the detection of genomic rearrangements associated with colorectalcancer or genetic predisposition to colorectal cancer; for the detectionof genomic rearrangements associated with breast cancer or geneticpredisposition to breast cancer; for the detection of genomicrearrangements associated with ovarian cancer or genetic predispositionto ovarian cancer; for the detection of genomic rearrangementsassociated with lung cancer or genetic predisposition to lung cancer.

Another embodiment of the invention is a composition containing theshort, or short and long probe(s) described by the first embodimentabove, wherein at least two of said probe sequences detect a geneticrearrangement by using Molecular Combing, said composition comprisingeither (a) at least one short (less than 10 kb) sequence and at leastone non-overlapping long sequence (more than 14 kb), or (b) at least onegroup of at least two short sequences, less than 10 kb each, which totallength is longer than 14 kb and less than 150 kb, hybridizingcontiguously on the genetic target. In this composition the shortprobe(s) can range from 0.5 kb to 9 kb and the long probe(s) can rangefrom 14 kb to 40 kb. The size of the short probes may range from 0.5 to9 kb and at least 90% of the frequent repetitive sequences can be beenremoved from the short probe sequences. This composition may containprobes sequences that hybridize specifically on the MSH2 gene or in theregion of the MSH2 gene or on the MLH1 gene or in the region of the MLH1gene.

In yet another embodiment the invention involves a method for designingshort and long probes described herein in relation to methods comprising(a) identifying a polynucleotide containing a genomic region ofinterest, (b) selecting long probe sequences outside of the genomicregion of interest but within 100 kb of the closest probe within theregion of interest and optionally removing frequently repeated sequencesfrom the long probe sequences, (c) selecting a set of short probesequences from within the genomic region of interest so that no gapslonger than 15 kb appear between the short probes; or selecting a seriesof short probes that together form a long continuous stretch that coversthe genomic region of interest; (d) hybridizing the probes to a genomicpolynucleotide comprising the genomic region of interest, (e) detectingthe hybridized probes, and (f) determining which sets of probes formmotifs that distinguish the genomic sequence of interest from areference genomic sequence.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1, which includes sub-parts identified as FIG. 1A, FIG. 1B, andFIG. 1C. (A) FIG. 1A: Dot-plot of MSH2 gene sequence on RP11-1084A21 BACclone. (B) FIG. 1B: probe code v1 (without repetitive element) onRP11-1084A21. (C) FIG. 1C: probe code-v2 on RP11-1084A21. Diagonal linesare perfectly matched region of DNA between two sequences. Dots arerepresentatives of repetitive elements. Higher density of dots (or greyband) are higher density of repetitive element.

FIG. 2, which includes sub-parts identified as FIG. 2A, FIG. 2B, andFIG. 2C. Dot plot analysis of MLH1 region. (A) FIG. 2A: Dot-plot of MLHI gene sequence on RP11-426N19 BAC clone. (B) FIG. 2B: probe code v1(without repetitive element) on RP11-426N19 (C) FIG. 2C: probe code-v2on RP11-426N19.

FIG. 3, which includes sub-parts identified as FIG. 3A and FIG. 2B.Designed probe set for MSH2 by exclusion of repetitive element. A) FIG.3A: theoretical probe set (labeled in red and green in microscopyexperiments represented here in grey and black, respectively), andposition of exon (small numbered dots). (B) FIG. 3B: actualhybridization image corresponding to MSH2-v1 probe set. Originalmicroscopy images consist of three channel images where each channel isthe signal from a given fluorophore—these are acquired separately in themicroscopy procedure. These channels are represented here as differentshades on a grayscale: green probes are shown in black and red probes ingray, while the background (absence of signal) is white. The aspectratio was not preserved, signals have been “widened” (i.e. stretchedperpendicularly to the direction of the DNA fiber) in order to improvethe visibility of the probes.

FIG. 4, which includes sub-parts identified as FIG. 4A and FIG. 4B.Designed probe set for MLH1 by exclusion of repetitive element. A) FIG.4A: theoretical probe set (red and green), and position of exon (purpledot). (B) FIG. 4B: actual hybridization image corresponding to MLH1-v1probe set. The same color conventions are used for diagrams andmicroscopy images as in panels A and B of FIG. 3.

FIG. 5, which includes sub-parts identified as FIG. 5A and FIG. 5B.Designed probe set for MSH2 with docking probes (v2). (A) FIG. 5A:theoretical probe set). B) FIG. 5B: actual hybridization imagecorresponding to MSH2-v2 probe set. The color conventions in this andthe other 3-color microscopy images (and corresponding diagrams) is asfollows: blue probes are represented in black, green probes in darkgray, red probes in light gray and the background is white.

FIG. 6, which includes sub-parts identified as FIG. 6A and FIG. 6B.Designed probe set for with docking probes (v2). (A) FIG. 6A:theoretical probe set). (B) FIG. 6B: actual hybridization imagecorresponding to MLH1-v1 probe set. The same color conventions are usedfor diagrams and microscopy images as in FIG. 5.

FIG. 7, which includes sub-parts identified as FIG. 7A, FIG. 7B, andFIG. 2C. Validation of genomic rearrangement in MSH2 in LoVo cell linewith v2 probe set. Sketches of both theoretical probe set (top; FIG. 7A)and validated rearrangement (middle, FIG. 7B) by molecular combing. Thephoto (bottom, FIG. 7C) is the recurrent abnormal signal set whichcorresponding to deletion from exon 3 to exon 8 of MSH2 (as in middle).The same color conventions are used for diagrams and microscopy imagesas in FIG. 5

FIG. 8, which includes sub-parts identified as FIG. 8A, FIG. 8B, andFIG. 8C. Validation of genomic rearrangement in MLH1 in SK-OV-3 cellline with v2 probe set. Sketches of both theoretical probe set (top;FIG. 8A) and validated rearrangement (middle; FIG. 8B) by molecularcombing. The photo (bottom; FIG. 8C) is the representative (but fewcases) signal set corresponding to the upper stream of MLH1 probe set(left side of theoretical probe set). The difference of observationnumber between MSH2 probe signal (normal) and MLH1 (a part of left side)clearly demonstrates that deletion of exon 4 to 19 in MLH1 ishomozygous, (consistent with reference 7). Molecular combing test alsorevealed that the breakpoint of deletion is larger than previouslyreported (downstream probes from exon 19 are all deleted). The samecolor conventions are used for diagrams and microscopy images as in FIG.5

Table 1. describes primer sequences and coordinates on human genomic DNAused for hybridization fragment synthesis to design the probes of theinvention. These primers or variant therefore obtained by addingnucleotides in the ends of the described sequences and having up to 40nucleotides, are part of the invention.

Table 2. Analysis of sequence of probe sets and their covering region.These sequences and the sets of probes that are disclosed in particular,are part of the invention.

Sequence of each of probe sets or region was subjected to RepeatMaskertest and some of representative values are shown in the table. Sumlength: sum up of sequence of all probes in each set. For MLH1 and MSH2regions, this is the total length of each region. Repeat length: sum ofsequences recognized as sorts of repeat in human genome. This includessequences other than SINE. Total repeat. % of repeat length in sumlength. SINE: % of sequences categorized as SINE in sum length. ALUs: %of sequences categorized as Alu family sequences in sum length.

DETAILED DESCRIPTION OF THE INVENTION

The above described strategies, for the reasons mentioned, areunsuitable to design a high-resolution code for diagnostics applicationsusing technologies such as molecular combing.

In the present invention, the probes are defined as follows: a shortprobe is a nucleic acid sequence complementary to a genomic sequence,which probe can be detected with a given marker (such as a fluorochrome)once hybridized on the genomic sequence. One probe may be either made of(i) one single fragment covering the whole sequence, or of (ii) severalexactly contiguous fragments, and/or (iii) slightly overlappingfragments (with an overlap less than 250 bp) and/or (iv) fragmentsseparated by a very short gap (less than 1000 bp). With such shortoverlaps or gaps, using Molecular combing in our current setup, thefragments appears almost contiguous. The distance may be adjusteddepending on the specific technique and experimental conditions. Forexample, with less resolutive conditions, longer gaps (less than 2 kb)or overlaps may be tolerated, provided fragments separated by such a gapstill appear contiguous. Under more resolutive conditions, gaps shouldbe shorter (less than 200 bp) in order for the fragments to appearcontiguous. Short probes range in size from 500 bp to 10 kb.

A long probe is a nucleic acid sequence complementary to a genomicsequence, which probe can be detected with a given marker (such as afluorochrome) once hybridized on the genomic sequence. One probe may beeither made of (i) one single fragment covering the whole sequence, orof (ii) several exactly contiguous fragments, and/or (iii) slightlyoverlapping fragments (with an overlap less than 250 bp) and/or (iv)fragments separated by a gap (less than 3.5 kb), provided that more than70% of the target sequence stretch is covered by probes (i.e. providedthe gaps represent less than 30% of the target sequence). With suchoverlaps or gaps, using Molecular combing in our current setup, thefragments are efficiently detected. The distance may be adjusteddepending on the specific technique and experimental conditions. Forexample, with less resolutive conditions, longer gaps (less than 5 kbeach, representing in total less than 50% of the sequence) or overlapsmay be tolerated, provided fragments separated by such gaps are stilldetected efficiently. Also, under such conditions, longer probes shouldbe used (more than 20 kb) to allow for efficient detection. Under moreresolutive conditions, gaps should be shorter (less than 2 kb) in orderfor the fragments to be efficiently detected, and probes may still beefficiently detected with shorter size (more than 10 kb). Long probesrange in size from 12 kb to 150 kb.

In the present invention, the size of probes reflects the length of thegenomic sequence where the probe hybridizes, independently of the numberof strands in the DNA molecules. Therefore, a probe may be described as1 kb (1 kilobase=1000 bases) or, indifferently, as 1000 bp (base pairs):in both cases, the probe hybridizes over 1000 bases of one of thestrands of the target DNA molecule (and, if the probe is doublestranded, also on the 1000 complementary bases of the other strand ofthe target molecule).

In the present invention, a “barcode” designates a specific motif formedby a set of probes labeled with different markers, where the motifcharacteristics are the lengths of the probes in the set, the lengths ofthe gaps separating successive probes and the colors in which the probesare detected (or, more generally, the markers with which the probes arelabeled).

If a high coverage barcode is to be designed for high resolution, probeand space lengths need to be roughly in the 0.5 kb to 10 kb range (seeabove). This makes it unpractical to design probes that completelyexclude rearrangements, and yet are spaced closely enough for the codeto allow high location precision. On the other hand, some non-specifichybridization (i.e. hybridization of [parts of] a probe on genomicregions that are not the designed target of that probe) of a probe isacceptable when using a code strategy for the reading of signals.Indeed, in applications such as Southern blot where the hybridization ofa single probe is assessed or aCGH where hybridization of every probe isconsidered separately, the non-specific hybridization of probes on evena very limited number of regions may lead to completely unusableresults. To a lesser extent, this is also the case with multiple-probeapplications such as FISH, since the resolution of FISH is insufficientto distinguish genomic regions as far apart as several tens ofmegabases: a single non-specific hybridization would lead to unusableresults if it were located close enough to the targeted region.

In molecular combing and other similar applications using a codestrategy, the quantity of non-specifically hybridized probes is not inissue per se. If a probe (or fragments of a probe) hybridizes evenmultiple times outside the region of interest, it is unlikely it willrecreate a motif sufficiently similar to the code to be confusing. Also,non-specific hybridization over short sequences (<<1 kb), even withinthe region of interest, would most likely not be detected, unless theyare sufficiently clustered to generate a long (>1 kb) stretch ofnon-specific hybridization. For the above reasons, the inventors havedeveloped an alternative approach for the design of probes when the mainissue is the design of a (several) high resolution code(s) in a(several) given genomic region(s). The main step of this approach reliesonly on the knowledge of the sequence of the region(s) themselves. Whendesigning such a code, the major issue is to avoid significantnon-specific hybridization within the regions of interest(s).Non-specific hybridization becomes an issue only if several probesdisplay non-specific hybridization on neighboring sequences outside theregion of interest. In the latter case, there is a risk that the patternof probes resembles the original code, or a rearranged version of it,and this would likely lead to false conclusions. Although the inventiondescribed herein does not allow excluding such occurrences, this isrelatively easily done once the method described herein has been used toexclude other non-specific hybridizations (see below).

The basis for this approach is the detection and exclusion of sequencesthat are repetitive within the region(s) of interest. For this, only thecorresponding sequence(s) (the target sequence(s)) have to be known. Oneeasy way to detect such repeats is the search for local sequencealignments within the target sequence(s), which can be done with e.g. adot-plot comparison of each target sequence with itself and the othertarget sequences. A dot-plot is a graph with the two (sets of) sequencesthat are being compared forming the two axis, while dots are printed atevery point where the coordinates correspond to a local homology. Forexample, if nucleotide x from sequence A (horizontal axis) matchesnucleotide y from sequence B (vertical axis), then a dot will appear atthe point with (x; y) coordinates. Graphically, local alignments appearas diagonal lines. Some more elaborate tools inspired from dot-plots areavailable, that compare short sequences (“words”, typically a fewnucleotides/tens of nucleotides long) rather than single nucleotides,and display dots in various shades of gray depending on the extent ofhomology, thus allowing a direct visual reading of relaxed homologies(non-specific hybridization may well appear with incomplete homology).The comparison may also be done directly on both strands for one of thesequences, so homologies appear for both sense and reverse complementorientations. An example of such a tool is “Dotter” (ref 4).

With these tools, very frequent repetitive sequences, such as Alusequences in the Human genome, appear quite clearly, as they have localhomologies with numerous other sequences within the target regions.Therefore, stretches with a high frequency of these sequences appear asa gray band (horizontal or vertical depending on whether the stretch islocated on the vertical or horizontal axis). The exact appearance ofthese stretches with dot-plot display tools will depend on settings, andpossibly word size. Settings were selected such that sequence stretcheslonger than 200 bp with more than 80% homology appear clearly and can belocated with a roughly 10 bp precision.

A sequence of 200 bp or more that contains more than 10 significanthomologous sequences (less than 1, 2, 3, 4, 5, 10, 15 or 20% nucleotidemismatch or insertion/deletion) within the regions of interest is afrequent repetitive sequence, prone to generate significant non-specifichybridization. It is generally possible to design probes in such a waythat they are void of these frequent repetitive sequences, thusincreasing the specificity and the high resolution of the presenttechnology compared to the published previous methods.

“Docking” Probes

Although, as shown above, shorter probes make for more preciselocalization of breakpoints and measurement of deleted or amplifiedsequences, they are, generally speaking, more difficult to detect withfiber-fish techniques and molecular combing, as they appear as shorterstretches of signal, i.e., they are both smaller and less easy todistinguish from noise (fluorescent spots either unrelated to probes orto hybridization of probes). This is particularly true when consideringautomatic (computer-based) detection of signals.

It is therefore desirable to include longer probes in the code (forexample, more than 12 kb and less than 150 kb, preferably more than 14kb and less than 40 kb, in particular for the detection of geneticrearrangements in the regions of MSH2 or MLH1 genes). These probes wouldappear as actual lines (rather than spots), readily distinguishable fromnoise and easily detectable due to their size. Once the signals ofinterest are detected, the detection of other probes located on the sameDNA fiber is easier.

This is especially true using technologies such as Molecular Combingwhere the linearity of the fibers implies the other probes, if any, arelocated in the alignment of the first probe. Therefore, the inventionprovides that the inclusion of longer (>12 kb, preferably >14 kb) probesin the set of probes is a step towards easier detection of signals ofinterest. Not all probes in the set need to be that long: in a fast and“rough” detection step, the long probes are sought, which allows thelocalization of signals of interest. These probes are called “dockingprobes” as they allow to “land” on the regions of interest efficiently.In a second step, the shorter probes are sought in the neighborhood ofthe docking probes (and more specifically in the case of MolecularCombing or related technologies, in the alignment of these probes).Although when performed by a human operator these steps can hardly beformally executed consecutively, if an operator may limit his search tolonger probes, he can browse through images more rapidly, which wouldonly allow him to detect these probes and spend more time on imageswhere a docking probe is seen in order to look for other shorter probes.As the longer docking probes would locally diminish the locationprecision and the resolution of the code, it is preferable for them notto be located in the region where rearrangements are sought. This ispossible if the probes are located near, but not in, the region ofinterest, e.g. at either end of this region.

If it is desirable to only consider complete signals in the analysis ofa given region (i.e. signals covering the entire contiguous region),these longer probes may also be used to assess the integrity of theregion: if there is a probe located at each end and both probes arepresent, no breakage of the fiber has occurred during the DNApreparation or stretching step. In cases where several non contiguousregions are analyzed in a single test, obviously each region has to haveits “docking” probes in order to be correctly detected.

Continuous Stretch of Short Probes

An alternative to the “docking probes” approach above is to design theset of probes in such a way that at least some groups of shorter probesform a continuous stretch of signal. This is possible if probe sequencesare adjacent. In that case, several probes, although short enough (lessthan 10 kb) to provide for sufficient resolution, may well combine toform a long enough (more than 14 kb) signal for fast and reliabledetection. Indeed, if the operator may combine color channels to viewimages, this stretch would still appear as a long line rather than aspot, allowing its distinction from background noise. This is possibleby using either common optical setups such as tri-color filters influorescence microscopy, or by using common image viewing software. Inthe case of automatic detection, it is also possible to use combinedcolor information and therefore to make use of the very characteristicaspect of a multicolor line relatively to background spot-like noise.

Measurements

The probe designs described above likely lead to a large number ofprobes to be measured in a test. The usual approach for probemeasurement is to measure all of the probes constituting a signal, aswell as the gaps separating them. In a test with a large number ofprobes, the amount of work required for analyzing results is increased.In order to balance this, the invention relates to a more efficientdesigned approach for signal measurement. This approach consists in themeasurement of subgroups of probes constituting easily recognizablemotifs. The subgroups are two or several consecutive probes and the gapsbetween them, and possibly gaps at either end, chosen in order for theirtotal length to remain within reasonably precise measurement range(10-30 kb).

There is likely to be a systematic bias in the measurement ofdigitalized images of fluorescent segments. Indeed, at the extremity ofsuch a fragment, the intensity of the signal decreases gradually whenmoving away from the center, to reach the level of the background.Depending on where the operator/the software sets the threshold for thedetermination of the actual end there may be a systematic over- orunder-estimation of the lengths. This bias is compensated for if themeasured motifs have a probe at one end and a gap at the other.Therefore, it is preferable to design motifs in this way.

If a motif is found to have an abnormal length (different from theexpected theoretical length) in a given sample, it remains possible tomeasure the probes and gaps within this motif in order to furtherprecise the location of the rearrangement. With this approach, it ispossible to measure in a fast and efficient way all of the signals forinitial screening, while keeping the location precision allowed by smallprobes. The somewhat lower precision on measurements due to the largersize of the subgroups compared to the probes is essentially compensatedfor by the higher number of signals that can be measured within the sameoperator time.

Application to HNPCC—Rationale

Colorectal cancer is the 4th most frequent form of cancer in human andaround 5% of the cancer is considered as a hereditary form. The mostfrequent form of hereditary colorectal cancer is known as Lynchsyndrome, or HNPCC (hereditary non-polyposis colorectal cancer). HNPCCincreases a lifetime risk of cancer development in up to 80% (lifetimerisk is around 7% in normal population US). HNPCC also increases othercancers (endometrial, ovarian, stomach).

Genetic aspect of HNPCC is known as a result of mutation in some ofMismatch Repair (MMR) genes such as MSH2, MLH1, MSH6, PMS2, etc. MSH2and MLH1 mutation accounts for more than 80% of all mutation of MMRgenes in HNPCC. Both point mutation and large rearrangements arereported in mutation of those genes, and especially high % of largemutation in MSH2 is observed because of high level of small repetitiveelement in its genetic sequence. Today the molecular diagnosis is doneafter studies of familial cancer history, tumor characterization bymicrosatellite instability test.

Normally mutation one alleles of one of MMR genes is sufficient formolecular diagnosis of HNPCC. All HNPCC individuals have both wild andmutated genes. Point mutation of targeted MMR genes can be detected bysequencing of genes and current sequencing test investigates only thesequence of exons. In case of large rearrangements such as deletion andamplification (loss and gain of genetic elements, respectively),sequencing does not detect them because altered sequences do not exist,and frequently primer binding regions for sequencing are deleted. As aresult, sequence information comes from only wild allele and gives falsenegative. Indeed, MSH2 and MLH1 genes are higher percentage ofrepetitive elements of SINE in their genetic sequence. To address thislarge rearrangement, the test should detect presence of deletion oramplification in the MMR genes. One approach is cartography of MMR geneswith designed probes of hybridization. Causal large rearrangement has awide range from sub-kb to loss of total gene (up to 100kb). A givencartography has to be sensitive to this wide dynamic range of mutation.To cope with it specific probe design was done for MSH2 and MLH1 loci.

The present invention is also related to the detection of known orunknown genomic rearrangements. It is also related to kits containingprobes according to the invention, for the detection of known or unknowngenomic rearrangements and the associated pathologies, or associatedpredispositions to pathologies such as cancers or cardiovasculardiseases for example.

EXAMPLES Application to HNPCC—Materials and Method Probe Design v1

Each probe (probe means continuous hybridization signal, can consist ofmultiple cloned DNA fragments, e.g., probe 1 of MSH2-v2 covers a 15 kbstretch and consists of five cloned DNA fragments of 3 kb. Since gap oroverlap of each junction of these five fragments are smaller thanresolution (<50 bp), they are considered and indeed look like continuoussingle probe of 15 kb) on a region of gene sequence itself has a lengthbetween 3-6 kb. In case of larger rearrangement than probe or gap size,obvious change of color pattern of designed probe will be observed. Aswell as large rearrangement in probe region, such rearrangement is alsodetectable in gap region, meaning any rearrangement larger than 1 kb atany position in the target genes are detectable. This is a uniqueness ofcartography method with high resolution probe hybridization. Othertechniques (MLPA, aCGH) can detect only such rearrangement involvingprobe sequence. For genes with high frequency of large rearrangementsuch as MSH2 and MLH 1, presence of repetitive element in their geneticsequence limits a freedom of probe design for the other technology.Inclusion of repetitive element sequence in their probe design increasesfalse detection a lot, their probe designing has to be free ofrepetitive element in principle.

Probe sequence was chosen by a dot plot analysis. BAC clone sequence ofeach gene (RP11-1084A21 (Ch2:47, 574, 044-47, 785, 729 for MSH2,RP11-426N19 (Ch3: 36, 992, 516-37, 161, 490) for MLH1 was self-plottedand all grey bands region were excluded from the target region of PCRprimer design. PCR primer set was designed in the target regions byPrimer3plus PCR primer design tool (ref 6). A list of the primers'sequence is shown in table 1A and B. Exclusion of Alu repeat wasverified by both dot-plot analysis and RepeatMasker(http://www._repeatmasker.org). FIG. 1B and FIG. 2B show a lot less greyband on dot-plot of probe fragment sequence on BAC clone than dot-plotof gene (containing Alu repeat) on BAC clone. This indicates thatsequence of designed probes does not include recurrent repetitivesequence in this target regions. RepeatMasker analysis (with defaultsetting of web server) also clearly shows a dramatic reduction of % ofAlu sequence in designed probe sequence.(table 2).

Probe Design v2

To facilitate “recognition” of barcodes on hybridization images, analternative design of probe set (called v2) was done as said in“Docking” probe section. Design process is same as vi except noexclusion of repetitive elements based on dot-plot. For v2 probe design,each probe was designed to have more than 3 kb length, close to limit tobe recognized as “line”, and all exon sequences are covered by a probestretch (no exons fall in gaps). Docking probes were designed on bothextremities of each gene with 15-20 kb length. For MSH2-v2 code,specific probes covering EPCAM gene (see rationale part) was alsoincluded between two docking probes. DNA sequence of designed code v2was subjected to dot-plot analysis to make sure that there is nosegmental repeats inside of designed region (FIGS. 1C and 2C).

Cloning of Probe Fragments and Labeling for Hybridization Probe

Each fragment of probes was amplified by PCR, then the fragment wasligated into plasmid vector (pNEB193, pCR2.1-TOPO, pCRXL-TOPO). Theligation product was transformed into E. coli competent cells andend-sequences of cloned fragment were verified. Purified plasmid DNA setof each gene was separated into two (v1) or three (v2) gropes accordingto colors corresponding to theoretical barcodes (FIG. 3A and FIG. 4A forv1, FIG. 5 and FIG. 6 for v2 probe sets). Each group of plasmid DNA waslabeled by random priming method. Either whole plasmids containing probefragments' sequence or PCR amplified probe fragments were used as atemplate for random priming. There are three haptens to be used forthree color detection, biotin (Biot), digoxigenin (Dig) and Alexa Fluor488 (A488). Biot-labeling was done by BioPrime DNA labeling system(Invitrogen) with manufacture's instruction. For Dig and A488 labeling,dNTP mixture in the kit was replaced with home-blend dNTP mixtures(either 0.1 mM Digoxigenin -11-dUTP (Roche applied science) for Diglabeling or 0.1 mM ChromaTide® Alexa Fluor® 488-7-OBEA-dCTP (Invitrogen)for A488 labeling, 0.1 mM unmodified equivalent (dTTP or dCTP) and 0.2mM each of other three deoxynucleotides in final labeling reactionsolution.).

Sample DNA Preparation

3 cell human cell lines were used for validation for large rearrangementdetection in either MSH2 or MLH1. Cell line GM17939 was used asnon-mutated sample. Cell line LoVo was used for MSH2 rearrangementvalidation, which is homozygous for deletion of exon 3-exon8 in MSH2.Another cell line SK-OV-3 was used for rearrangement validation of MLH1,which was reported as homozygous deletion of exon 4-exon 19 in MLH1. Foreach cell line, cell culture was prepared according to cell bank'sinstruction. Cultured cells were harvested (for LoVo and SK-OV-3 when50-70% confluency) or collected by centrifuge (for GM17939 when between300,000-400,000cells/ml of medium. Cell pellet was resuspended in1×PBS/Trypsin mixture to have 1,000,000 cells in 45 μl the cellsuspension was mixed with an equal volume of 1.2% (w/v) NuSieve GTGagarose solution in 1×PBS (melted and equilibrated at 50° C. inadvance). The cell/agarose mixture as poured into a well of gel plugmold, followed by gelification at 4° C. for 30 min. the gelified agaroseplug was immersed in a mixture of 2 mg/ml of Proteinase K, 1% (w/v) ofsarcosyl in 0.5M EDTA (pH8.0, 250 μl for each plug). The agarose plugwas incubated at 50° C. overnight.

Next day the incubated plug was washed in 1×TE (10mM Tris-HCl, 1 mMEDTA, pH8.0) 3 times for 1 hour each. The DNA plug can be stored in0.5mEDTA at 4° C. The washed plug was stained in 100 μl of 33 μM YOYO-1(Invitrogen) in TE40.2 (40 mM Tris-HCl, 2 mM EDTA pH8.0) for 1 hour inthe dark. The stained plug was heated at 68° C. in 1 ml of combingbuffer (0.5M MES pH5.5) for 20 min, then cooled at 42° C. 10 min priorto add 1.5 unit of beta agarase I (NEB). Beta agarase treatment wascarried overnight at 42° C. in the dark.

The following day the treated DNA solution was poured into a combingreservoir and a level of the solution in the reservoir was adjusted withadditional combing buffer.

Molecular Combing

The DNA solution was set on a Molecular Combing Machine (MCS, GenomicVision). Molecular combing was performed on a silanized coverslips(Combicoverslips, Genomic Vision). The combed coverslips was fixed at68° C. for 4 hours, then used for hybridization (or stored at −20° C.until use).

Hybridization and Detection of Probe

For one hybridization, 5 μof each of labeled probe solutions (of bothMSH2 and MLH1) was combined together and with 10 μg of sonicated herringor salmon sperm DNA and 10 μg of human Cot1-DNA (only for V2 probesets), then purified by standard ethanol precipitation. The precipitatewas resuspended with 20 μl of hybridization buffer (50% formamide, 2×SSC, 1% SDS and BlockAid blocking solution (Invitrogen)). Theresuspended probe solution was set on a clean glass slide and coveredwith a DNA combed coverslip. The slide was heated at 90° C. for 5 minfor co-denaturation of both probe and combed DNA then incubated at 37°C. overnight with an humidity for hybridization between labeled probesand combed DNA.

The hybridized coverslips was washed in 50% Formamid/2×SSC solution 3times for 5 min each, followed by another 3 times washing with 2×SSC for5 min each. The washed coveslips was then developed with two or threelayers of fluorescently labeled antibodies or streptavidin. For eachlayer, antibodies for all haptens were diluted 25 times in BlockAidblocking solution (20 μl in final volume) and incubated for 20 min at37° C. For Biot, Streptavidin Alexa Fluor 594 (Invitrogen) was used forthe 1^(st) and the 3^(rd) layer, biotin conjugated-goatanti-streptavidin antibody was used for the 2^(nd) layer. Fr Dig, mouseanti-Digoxin AMCA conjugated (Jackson immunoresearch) was for the 1^(st)layer, rat anti-mouse AMCA conjugated (Jackson immunoresearch)conjugated was for the 2^(nd), the goat anti-rat Alexa Fluore 350conjugated (Invitrogen) was used for the 3^(rd) layer. For A488, rabbitanti-Alexa Fluor 488 (Invitrogen) was used for the 1^(st) layer, goatanti-rabbit Alexa Fluor 488 conjugated was used for the 2^(nd) layer (nothird antibody for A488). After 20 min incubation of each layer ofantibody, the coverslip was washed in 2×SSC/1% Tween 20 washing solution3 times for 5 min each at room temperature. After the washing of 3^(rd)layer, the coverslip was rinsed in 1×PBS, followed by successive bath of70, 90 and 100% ethanol for 1 min each. The coverslip was dried at roomtemperature prior to microscopy.

Signal Acquisition and Measurement

Fluorescent signal of developed antibody on the coverslip was obtainedby standard epi-fluorescent microscope system or automated fluorescentmicroscope system (Image Xpress Micro, Molecular Devices) with customscanning configuration for molecular combing signal. Every set oflinearly aligned fluorescent signals and gaps was measured by ImageJ.Each measured set of signals (with color information) was subjected topattern matching to determine position (if the set is a part of one ofprobe set) and orientation by comparison with the theoretical probesets. All unclassified sets (did not match with any positions andorientations of theoretical probe sets) were subjected to similaritycheck between them to find whether recurrent abnormal pattern appears ornot.

Application to HNPCC—Results

FIGS. 3B and 4B are representative images of signal from hybridized DNA.Some of probes look like “dot” rather than “line” as expected from theirlength. There are some “random” spots on images of hybridization, butthese spots do not interfere recognition of designed code. Althoughsignals of some small probes (arrowed in FIG. 3B, for example) is notevident to measure “length” of probe signals for size evaluation,measurement of “distance” between probe signals is possible andequivalent to measurement of the length of probe and gaps in normalprobe set hybridization

FIGS. 5B and 6B are the representative image of hybridization signal ofbarcodes-v2. Fluorescent signals are more continuous than the signals ofbarcodes-v1, and easier to find docking probes and measure the length ofeach probe and gap. These barcodes-v2 were used to visualize largegenomic rearrangements of characterized cancer cell lines, LoVo andSK-OV-3 (ref. 5).

FIG. 7 is a result of hybridization of barcodes v2 on combed DNA fromLoVo cell line; LoVo cell line is homozygous for deletion in MSH2 (fromexon 3 to 8). Hybridization slide had many normal (identical totheoretical code) signal of MLH1 gene but none of normal MSH2 signals.Instead, there was a recurrent signal of truncated form of the normalMSH2 signal (FIG. 7B). By deduction from the truncated signals, thistruncation results from loss of probes and gaps corresponding to ex3 to8 of MSH2 gene.

FIG. 8 is a result of barcodes-v2 on SK-OV-3 cell line DNA, homozygousfor deletion in MLH1 (from ex4 to 19). Among many normal MSH2 signals,only a few signals of part of MLH1 (from probe 1 to probe 3) wereobserved. This means a lack of following sequence of MLH1, which isconsistent with reference. Moreover, a lack of the right (downstream ofMLH1) docking probe indicates that this deletion affects beyond exon 19of MLH1.

The sequences selected to detect predisposition to colorectal cancerlinked to rearrangements in the MSH2 genomic region or the MLH 1 genomicregion are preferably chosen among the following nucleotide sequencesand their corresponding complementary sequences and are described as:

The short probes covering the MSH2 gene region and constitutingcontiguous stretches (PEI-2 and PE3-6 (SEQ ID NO:354-358); PE9 toPE15-16 (SEQ ID NO:365-373) in table 1 under the header MSH2-v2) and theother short probes covering MSH2 gene region (PE7 and PE8, SEQ IDNO:359-364 in table 1 under the header MSH2-v2); the long probesneighboring the MSH2 gene (tPP1, EPCAM5′, EPCAM3′ (SEQ ID NO:342-353)and cPP1 (SEQ ID NO:374-378) in table 1 under the header MSH2-v2); theshort probes covering the MLH1 gene region and constituting a contiguousstretch (PE1-2 to PE 10-11, SEQ ID NO:386-396, in table 1 under theheader MILH1-v2) and the other short probes covering MLH1 gene region(PE12-13, PEI4-15 and PEI6-19, SEQ ID NO:397-401, in table 1 under theheader MLH1-v2); the long probes neighboring the MLH1 gene (tPP1 (SEQ IDNO:379-385) and cPP1 (SEQ ID NO:402-408) in table 1 under the headerMLH1-v2). For example, these probes may be obtained by amplification ofthe fragments using the primers listed in Table 1 under the headersMSH2-v2 (SEQ ID NO:139-212) and MLH1-v2 (SEQ ID NO:213-272).

Incorporation by Reference

Each document, patent, patent application or patent publication cited byor referred to in this disclosure is incorporated by reference in itsentirety, especially with respect to the specific subject mattersurrounding the citation of the reference in the text. However, noadmission is made that any such reference constitutes background art andthe right to challenge the accuracy and pertinence of the citeddocuments is reserved.

TABLE 1 MSH2-v1 Name SEQ ID SEQ ID of Name of NO For / NO probe fragment(fragment) Rev (primer) Sequence (5′-3′) start end P1 P1a_MSH2-v1 273forward 1 TTCTTCCCAAGAGAGCCAAG 47595911 47595930 reverse 2CTGTTTTGGAACCCCAAGTC 47597074 47597093 P1b_MSH2-v1 274 forward 3GGCTTCAATCTGGGACTACG 47598716 47598735 reverse 4 GCTGTCACCGCCTCTTTTAC47599478 47599497 P1c_MSH2-v1 275 forward 5 GCCAGGCACTTAGGCAGTAG47600433 47600452 reverse 6 TTGGTCCTGACATCCTTTCC 47601671 47601690P1d_MSH2-v1 276 forward 7 TTAGTTGAACAGGGCATGACAC 47602097 47602118reverse 8 GGTAAAGGGGCCTGATGTC 47602743 47602761 P1e_MSH2-v1 277 forward9 GAGCCTTGATGTTCCCTCTTAAC 47603695 47603717 reverse 10ACCCAGATCCGAAACTGTTG 47604324 47604343 P1f_MSH2-v1 278 forward 11CCGGCCTTACCTTTCATTTC 47605735 47605754 reverse 12 CCAGGATCCAGATCCAGTTG47606965 47606984 P2 P2a_MSH2-v1 279 forward 13 GAGTTCCATGGCAGATCACC47612521 47612540 reverse 14 GCAGCTTTCAATCACAAATCAG 47614067 47614088P2b_MSH2-v1 280 forward 15 GAAGGGTTGGTCTTGCTGTC 47615115 47615134reverse 16 ACCCTTTGCACCTCTCTGTG 47615632 47615651 P2c_MSH2-v1 281forward 17 CCCGGTGTTGAATCATTTG 47616079 47616097 reverse 18TTCAGCCCTGAAGGTAGAGG 47617513 47617532 P2d_MSH2-v1 282 forward 19CTGGCCACTTTTTGGAAGAG 47618884 47618903 reverse 20 TGGGACGCAGAGTGATACAG47619394 47619413 P3 P3a_MSH2-v1 283 forward 21 TTACTGGCGATCCTCAGAGC47629651 47629670 reverse 22 AACGCCTCTTCCGTTGTATG 47631623 47631642P3b_MSH2-v1 284 forward 23 GAAAGGACAGACCAAGTGCAG 47632605 47632625reverse 24 AGCCTGTGCAGGGAAACTC 47633083 47633101 P3c_MSH2-v1 285 forward25 AGTGGGATGCAGCTGAAAAG 47633591 47633610 reverse 26CAACAGCATGGGAAAGATCC 47635238 47635257 P4 P4a_MSH2-v1 286 forward 27TTGAAAGTTGGTCTTAGGAAGAGG 47643286 47643309 reverse 28CCCAACAAACCTGGCTTTAG 47644179 47644198 P4b_MSH2-v1 287 forward 29AGACGCCCAAAATCAACAAC 47645155 47645174 reverse 30 CCGCTTGCTGCTAAAAATTG47646042 47646061 P5 P5a_MSH2-v1 288 forward 31 TGATTGCCAAGGAAGATTCAC47657647 47657667 reverse 32 TGGAAGTAAATGCAGGTGCTC 47658763 47658783P5b_MSH2-v1 289 forward 33 TCATTCTTGGGTGTTTCTCG 47659578 47659597reverse 34 ATGGCGGTTTTGTGGAATAG 47660015 47660034 P5c_MSH2-v1 290forward 35 GAGGGAGAGGGAACCTTTTG 47661699 47661718 reverse 36GGGGACTATACCGCATTCAC 47662243 47662262 P6 P6a_MSH2-v1 291 forward 37TGTTGATTCATGGGCATTTG 47669651 47669670 reverse 38 GCTGGGGAATCATGTATGAAG47671879 47671899 P6b_MSH2-v1 292 forward 39 CATCAAGCACAGTTCCATTG47672243 47672262 reverse 40 TTCTCTTTCCGTTTCCAGTG 47673113 47673132 P7P7a_MSH2-v1 293 forward 41 GGAGCTTGGGAATTCAACTG 47678126 47678145reverse 42 AGAAACGGGCATGTCATAGG 47679330 47679349 P7b_MSH2-v1 294forward 43 CAGCCTACGTGCCCATTTC 47679649 47679667 reverse 44TCAAAAGATGGCCAAAATGC 47681179 47681198 P7c_MSH2-v1 295 forward 45GTGTTGCACCCATTAACTCG 47681915 47681934 reverse 46 AGCCTGGTGAGAGGTGACTG47684723 47684742 P8 P8a_MSH2-v1 296 forward 47 CACGATGCCAGTCCAATTC47689478 47689496 reverse 48 AAGGTGGACTTTAATGCAAAGG 47690835 47690856P8b_MSH2-v1 297 forward 49 GGAGTGAGAGCGACACCTTG 47691634 47691653reverse 50 CGACAGCTGACTGCTCTATGG 47694068 47694088 P9 P9a_MSH2-v1 298forward 51 CACAATGGGAAAGGATGTAGC 47701939 47701959 reverse 52CAGAGAAAAACACCCATGACC 47704112 47704132 P9b_MSH2-v1 299 forward 53CACCGTGATCCTCCTTATTTC 47704395 47704415 reverse 54 GAACAAACAACGGATGAAAGG47704945 47704965 P9c_MSH2-v1 300 forward 55 GTGGCATATCCTTCCCAATG47705311 47705330 reverse 56 CCCCCAGACTGTGAATTAAGG 47705787 47705807 P10P10a_MSH2-v1 301 forward 57 GATGCAGATCAGGGAAATGC 47711630 47711649reverse 58 ATCTTGCTGGATGGACAAGG 47715272 47715291 P10b_MSH2-v1 302forward 59 CTTAATCCTGAAAGGCAGGTG 47715788 47715808 reverse 60TGTTTCTCAGGCAACCACAG 47717266 47717285 P11 P11a_MSH2-v1 303 forward 61GAAACCACAGAATCGCCTTC 47731087 47731106 reverse 62 ACCTGGACAGTCCCACAGAC47733482 47733501 P11b_MSH2-v1 304 forward 63 CAGTGCTTTTGCATCCTTCC47734903 47734922 reverse 64 ATTTAATCCCCTGGCCAATC 47741649 47741668P11c_MSH2-v1 305 forward 65 CACCTGTGCCCATCACATAG 47742239 47742258reverse 66 GAGTCCCCTCTTGGAGAACC 47747829 47747848 P12 P12a_MSH2-v1 306forward 67 AAAGCCATTTCCAGTGTCG 47753989 47754007 reverse 68ATTGTGCAGCCAGAATTGAG 47758158 47758177 P12b_MSH2-v1 307 forward 69TTCACAGCAAAGTGGCTCAG 47760593 47760612 reverse 70 GCTATTATGGGCTGCAAAGC47764302 47764321 P12c_MSH2-v1 308 forward 71 TTCACTCCCAACAAGCACTG47764863 47764882 reverse 72 TGCCCAGTCCTTTTTCACT 47765618 47765636P12d_MSH2-v1 309 forward 73 AATCCCTCCTGCACACTTTC 47765925 47765944reverse 74 AATGGATGCTTCCACTGTCC 47767687 47767706 P12e_MSH2-v1 310forward 75 CCATCTGTGCAATTCCTTCC 47768105 47768124 reverse 76GTTCAAAGGCAGAAGCCATC 47769886 47769905 MLH1-v1 SEQ ID Name of Name of NOFor / SEQ ID NO probe fragment (fragment) Rev (primer) sequence (5′-3′)start end P1 P1a_MLH1-v1 311 forward 77 GTCTGGATTCTTTCACAATGTAGC37005551 37005576 reverse 78 TGCCAATCTTCTCCTCTGTTC 37006562 37006582P1b_MLH1-v1 312 forward 79 AACCACCCAATGTGTTCACC 37006815 37006836reverse 80 GTTCATTCCTGCGAGTAGGC 37007422 37007441 P1c_MLH1-v1 313forward 81 GCCAAAGGTGGAAAATGTTG 37008987 37009008 reverse 82GCCTTCTTCATGAAAGCACTG 37009873 37009893 P1d_MLH1-v1 314 forward 83CCAGAAGGTGGAAGCTACAG 37011079 37011100 reverse 84 TGGGGTCAATGAAGCAAG37011830 37011847 P1e_MLH1-v1 315 forward 85 ACATCGACCCAGAAAGTTCC37012314 37012335 reverse 86 AATGTGCTTCGTACCACTGC 37012867 37012886P1f_MLH1-v1 316 forward 87 AGCGTGCCATTGTACTCTCC 37013822 37013843reverse 88 TTTCTGAGCCCATGATTTCC 37015267 37015286 P2 P2a_MLH1-v1 317forward 89 GTGCCCAGCTAGTTCCATTC 37023623 37023644 reverse 90TCAAGAGCGCTAATCCCATC 37025002 37025021 P2b_MLH1-v1 318 forward 91TGCACATGCTCACTGAAAGAC 37026505 37026527 reverse 92 TTTTGCCTGCAAACTGACC37027818 37027836 P2c_MLH1-v1 319 forward 93 CAGCAAGCACCAAATCACTG37028305 37028326 reverse 94 AGTACCAGCCGTCCAAACTG 37032621 37032640 P3P3a_MLH1-v1 320 forward 95 CCTGGCCAGAAAATTCATTG 37037607 37037628reverse 96 ACCCTGCATTCCAAACTCAC 37039199 37039218 P3b_MLH1-v1 321forward 97 GCAGTCCTTTGAGGATTTAGC 37042493 37042515 reverse 98GAAAGATATCCAACAGGAAGTGAG 37043300 37043323 P3c_MLH1-v1 322 forward 99TGGCCTTGTTTAAGGTCCTG 37043746 37043767 reverse 100 ATGGTCCTGCTGCTTCAGAG37044723 37044742 P3d_MLH1-v1 323 forward 101 ACCCCGTCATAGCACAGTTC37045295 37045316 reverse 102 CAAAGGCCATTCATCAGTTTC 37046439 37046459 P4P4a_MLH1-v1 324 forward 103 GTGGCGTGATATCCTTGATTC 37053034 37053056reverse 104 CTCTGGAATGACTGCTGCTG 37054289 37054308 P4b_MLH1-v1 325forward 105 TGTGCTAGATGCCTCACTGG 37055182 37055203 reverse 106TTGCCAAGAAGCACAACAAG 37058326 37058345 P5 P5a1_MLH1-v1 326 forward 107CGGAGGCTCTACTGTTGGAC 37062345 37062366 reverse 108 TGCTGTCCACTCTGGAACTG37064753 37064772 P5b_MLH1-v1 327 forward 109 ACATCAGAAGCCCTGGTTTG37064571 37064592 reverse 110 GCTGGGAGTTCAAGCATCTC 37067377 37067396 P6P6a_MLH1-v1 328 forward 111 TCGGTCTCAGTCACCATTTG 37072097 37072118reverse 112 AACGCACCTGGCTGAAATAC 37075920 37075939 P7 P7a_MLH1-v1 329forward 113 TGAACCTGCAATATCTCAGAGG 37079607 37079630 reverse 114CTTACCGATAACCTGAGAACACC 37083805 37083827 P8 P8a_MLH1-v1 330 forward 115CCCAGCCCATATATTTTAAAGC 37088387 37088410 reverse 116CCAGCCACTCTCTGGACTATC 37089049 37089069 P8b_MLH1-v1 331 forward 117GACATGGAGAGCCGAATCC 37089669 37089689 reverse 118 CCATTAAAATCGGGTCTGAAAG37091446 37091467 P8c_MLH1-v1 332 forward 119 TCCAGACCCAGTGCACATC37091887 37091907 reverse 120 CATGGTCAGTGCCATCAGAG 37092412 37092431P8d_MLH1-v1 333 forward 121 AGCCTCCCAAAGTTAAGTGC 37092788 37092809reverse 122 CCCAGCTAAAACCAACACAC 37093346 37093365 P9 P9a_MLH1-v1 334forward 123 TGCCCTCAGCTACTCACTCC 37103285 37103306 reverse 124AGGGCTCAGCCTTTAGGAAC 37105620 37105639 P9b_MLH1-v1 335 forward 125GCCAGACTCTCGTTCCATTC 37106390 37106411 reverse 126 ACTCCCCATTCAGTCCCTTC37111053 37111072 P9c_MLH1-v1 336 forward 127 AGGCACAACGTCAGGTTTTC37114109 37114130 reverse 128 TTGGAATTTGTCCTGGTGTG 37117519 37117538 P10P10a_MLH1-v1 337 forward 129 CACCATTGCCAACACTTCTG 37132898 37132919reverse 130 GCCATTGGTTTGAAGGTGAC 37134201 37134220 P10b_MLH1-v1 338forward 131 CTTAGTCACCGCCTGTCCTC 37134738 37134759 reverse 132TAGCTGCATGTGGCTAATCG 37136986 37137005 P10c_MLH1-v1 339 forward 133TGTGGCTCGCATTACATTTC 37137579 37137600 reverse 134 CGCTGTCATTACCTGCTTTG37139742 37139761 P10d_MLH1-v1 340 forward 135 TGACCTCCAAAATCATCCAG37140449 37140470 reverse 136 TTCTGAGCTAGGAGGTGCTG 37141321 37141340P10e_MLH1-v1 341 forward 137 CCAGATTTGTAAATCCCTGTTC 37142008 37142031reverse 138 TGTGTGGTTCTTAAGCATTCC 37142420 37142440 MSH2-v2 Name ofSEQ ID NO For / SEQ ID NO probe Name of fragment (fragment) Rev (primer)sequence (5′-3/) start end tPP1 tPP1a_MSH2-v2 342 forward 139CTCAGTCCATCAGCCTCCTC 47574824 47577784 reverse 140 TGCTGTGCCCTGAGATTAAG47574823 47577783 tPP1b_MSH2-v2 343 forward 141 AACTTAATCTCAGGGCACAGC47577763 47580677 reverse 142 TGCAGCTTCAGCCTCTTG 47577762 47580676tPP1c_MSH2-v2 344 forward 143 GCGTGGTGTTTCGTACCAG 47580604 47583785reverse 144 GCTACTGGCCAGAAATCTTCC 47580603 47583784 tPP1d_MSH2-v2 345forward 145 GCCCAGCCCTACTAAGGAAG 47583750 47586723 reverse 146CTGTGCTCCCCTGCTAGAAC 47583749 47586722 tPP1e_MSH2-v2 346 forward 147GTCGTCCICTTCGACCTAGC 47586769 47589967 reverse 148 CAGCGCCTATTCTACAGCAG47586768 47589966 EPCAM5′ EPCa_MSH2-v2 347 forward 149TTCTTCCCAAGAGAGCCAAG 47595912 47598965 reverse 150 CCACCTTTAATCTGCCCAAC47595911 47598964 EPCb_MSH2-v2 348 forward 151 GTGTTGGGCAGATTAAAGGTG47598944 47602122 reverse 152 GCAGTGTCATGCCCTGTTC 47598943 47602121EPCc_MSH2-v2 349 forward 153 CTCTTIGTGCCCTITCTTTTG 47601745 47604931reverse 154 AGTTCCTTAAAGCAGAGAAGATGG 47601744 47604930 EPCAM3′EPCd_MSH2-v2 350 forward 155 AACCTGTCCCTGTGGATGAG 47604796 47607923reverse 156 CCGAAGCATCCTTACATTCC 47604795 47607922 EPCe_MSH2-v2 351forward 157 AATACCTGAACCCCCAAACC 47607722 47609876 reverse 158CTCAGGCTATTTTCCAGATTCAC 47607721 47609875 EPCf_MSH2-v2 352 forward 159GCATGCCTGTCATTCTGG 47609695 47612812 reverse 160 TCCAAGGGACTGAAACACAC47609694 47612811 EPCg_MSH2-v2 353 forward 161 TTAGTGTGTTTCAGTCCCTTGG47612790 47615135 reverse 162 GACAGCAAGACCAACCCTTC 47612789 47615134PE-2 E1_MSH2-v2 354 forward 163 GCACATTACGAGCTCAGTGC 47629942 47633045reverse 164 CTACCAGGAGAACAGCACAGG 47629941 47633044 E2_MSH2-v2 355forward 165 TGGGTTAGCATTGTGTTAGGTG 47632899 47636029 reverse 166CCACAGGTGTGTGCCAATAG 47632898 47636028 PE3-6 E3_MSH2-v2 356 forward 167AAGTTGCAGTTTGGCTGGTC 47635845 47638929 reverse 168 TTATCTCCAGCGGTGCTTATG47635844 47638928 E4_MSH2-v2 357 forward 169 TACCATAAGCACCGCTGGAG47638906 47642053 reverse 170 ACTCCACCAAGCCCAGTCTC 47638905 47642052E5-6_MSH2-v2 358 forward 171 TTTAGAGACTGGGCTTGGTG 47642030 47644205reverse 172 CTCTTCCCCAACAAACCTG 47642029 47644204 PE7 I6-7_MSH2-v2 359forward 173 CCCAGTTTCAAGCGATTAAG 47651443 47654570 reverse 174AGGAAAAGCATGTTATCTCCAG 47651442 47654569 E7_MSH2-v2 360 forward 175TTCCGTAGCAGTAGGCATCC 47654026 47657170 reverse 176TCACCACCACCAACTTTATGAG 47654025 47657169 I7-8_MSH2-v2 361 forward 177TCCCAGATCTTAACCGACTTG 47656956 47660035 reverse 178 ATGGCGGTTTTGTGGAATAG47656955 47660034 PE8 E8_MSH2-v2 362 forward 179 CCCAAACAACAGCATTAGCC47670887 47673915 reverse 180 ACATCAGCCTCGGGACAAG 47670886 47673914I8-9a_MSH2-v2 363 forward 181 TGAGCCCGTTGAATATAGTGG 47673830 47675514reverse 182 AGTTTTCCTAAACGGGATGATG 47673829 47675513 I8-9b_MSH2-v2 364forward 183 ATGGGTGTGCACGTGTGTAG 47675368 47678365 reverse 184GCCATGTGCAATTGTGAGTC 47675367 47678364 PE9 E9_MSH2-v2 365 forward 185CCTTGCATAGII1GCTTCTGG 47688375 47690450 reverse 186 ATCATACAAGGGCCTGTTGG47688374 47690449 I9-10_MSH2-v2 366 forward 187 AAACAGAAATCGCCCAACAG47690418 47692377 reverse 188 TAGAGACCCACCCAGAAACG 47690417 47692376PE10 E10_MSH2-v2 367 forward 189 CAGTCCGATTTCGTTICTGG 47692347 47695506reverse 190 CACACCTAGATTTGGCAATGG 47692346 47695505 PE11 E11_MSH2-v2 368forward 191 TTCCATTGCCAAATCTAGGTG 47695484 47698468 reverse 192GGCCCTAGTGTTTCCTTTCC 47695483 47698467 I11-12_MSH2-v2 369 forward 193AAGGAAACACTAGGGCCTACAAC 47698452 47700589 reverse 194CCTGGCCTCAGTACACTITTG 47698451 47700588 PE12-14 E12_MSH2-v2 370 forward195 AGGGATTCTCCCCACTTAGC 47700228 47702718 reverse 196ATTGGAGGACTGGCTCAAAG 47700227 47702718 E13-14_MSH2-v2 371 forward 197GCTTACCTTTGAGCCAGTCC 47702694 47705819 reverse 198 ACATGTTCCTACCCCCAGAC47702693 47705818 PE15-16 E15_MSH2-v2 372 forward 199TTTCTGCATCAGTTGGTTGC 47706613 47709532 reverse 200 GCCAAGTTATTGCTGCTTCAG47706612 47709531 E16_MSH2-v2 373 forward 201 AGCCCTGTGAGGTTGGTAAC47709413 47712504 reverse 202 TCAACAACAGCTGGAACTGC 47709412 47712503cPP1 cPP1a_MSH2-v2 374 forward 203 CCTCTCAGGTCAGGCTTCTG 4773089847733882 reverse 204 GCTCCCGCTAGAGAAACTCC 47730897 47733881cPP1b_MSH2-v2 375 forward 205 GAGCGAAGCACCTAAAGCAC 47733879 47736946reverse 206 AATTGGAGGGGGTGGAGTAG 47733878 47736945 cPP1c_MSH2-v2 376forward 207 TGTCACCCAGTCAGGTCATC 47736760 47739876 reverse 208TTGGAAGGAATCCAACAAGG 47736759 47739875 cPP1d_MSH2-v2 377 forward 209TTCCCAGAACTCCTTGTTGG 47739846 47742962 reverse 210 TGCAAACCCCTTCTTTTCAG47739845 47742961 cPP1e_MSH2-v2 378 forward 211 ACCCCATGCAGAAGCAATAG47743027 47746218 reverse 212 AAATCCTGAAGGTGGGTTCC 47743026 47746217MLH1v2 Name of Name of SEQ ID NO For / SEQ ID NO probe fragment(fragment) Rev (primer) sequence (5′-3′) start end tPP1 tPP1b_MLH1-v2379 forward 213 AGTTTCAGCCATGTTGCAG 37005587 37005605 reverse 214TTGGCAAAATTGTGACTGAG 37007511 37007530 tPP1c_MLH1-v2 380 forward 215CAGTCACAATTTTGCCAAGG 37007513 37007532 reverse 216AGTTCGTGGCATCTAACTATCG 37009688 37009709 tPP1d_MLH1-v2 381 forward 217GGTCCATGTGCTCCAAAAAG 37009460 37009479 reverse 218 TCCAAAACTGGGAACAAACC37012624 37012643 tPP1e_MLH1-v2 382 forward 219 TGGTTTGTTCCCAGTTTTGG37012623 37012642 reverse 220 TAGTGCACCACAGCCTCAAG 37015706 37015725tPPlf_MLH1-v2 383 forward 221 GGATCACTTGAGGCTGTGGT 37015700 37015719reverse 222 TCCAACAACTGCTGTGAAGG 37018677 37018696 tPP1g_MLH1-v2 384forward 223 CACCACTGACCTTCCCTTCC 37018492 37018511 reverse 224GCACAGAAAGACAAATATCACATGC 37020534 37020558 tPP1h_MLH1-v2 385 forward225 CTCTTCCTCGTCTCCTCCTG 37020430 37020449 reverse 226CCAATTCAATGCAAAACCTG 37022464 37022483 PE1-2 E1_MLH1-v2 386 forward 227CGAGCAGCTCTCTCTTCAGG 37034273 37034292 reverse 228AGCCTATAAGCACAGACCAACTG 37037250 37037272 E2_MLH1-v2 387 forward 229TTCTCTAGCAGTTGGTCTGTGC 37037242 37037263 reverse 230ACCCTGCATTCCAAACTCAC 37039199 37039218 PE3-4 I23_MLH1-v2 388 forward 231GTTCATTTTGGGGCATGTTC 37039148 37039167 reverse 232 CTGCAACCTCCTTTGAGACAG37042218 37042238 E3_MLH1-v2 389 forward 233 TGTCTCAAAGGAGGTTGCAG37042219 37042238 reverse 234 CCAAAATGAAACTGCCTTCC 37044367 37044386E4_MLH1-v2 390 forward 235 AGTTCCCTGGGTCATTTTCC 37044393 37044412reverse 236 TTGTGGGAAGGCAAACTAGC 37046381 37046400 PES-6 E5_MLH1-v2 391forward 237 CCTGTGCTAGTTTGCCTTCC 37046376 37046395 reverse 238GGTGGTCACCGTGGTAAAAG 37049553 37049572 E6_MLH1-v2 392 forward 239GACCACCATGTGATTTCCAAG 37049566 37049586 reverse 240 TTGGTTGGCGGTTATTTCTC37052510 37052529 PE7-9 E7-8_MLH1-v2 393 forward 241TAACCGCCAACCAAGAAAAG 37052516 37052535 reverse 242 TGTCTGGAGACCTTCCCAAG37055360 37055379 E9_MLH1-v2 394 forward 243 TGTGCTAGATGCCTCACTGG37055182 37055201 reverse 244 ACTTGCCTACATTGCCCATC 37058175 37058194PE10-11 E10_MLH1-v2 395 forward 245 ATGGGCAATGTAGGCAAGTC 3705817637058195 reverse 246 TCTGCAGCCATGAATAAGTCC 37061070 37061090 E11_MLH1-v2396 forward 247 CAGAGCTGAGGCGATAAATTG 37060960 37060980 reverse 248TGCTCCTCTCCAATCCATTC 37063973 37063992 PE12-13 E12_MLH1-v2 397 forward249 ATACTTTCCCAGCCCAAACC 37066434 37066453 reverse 250TGATGGGGAAATGAGAGGAG 37069438 37069457 E13_MLH1-v2 398 forward 251AGTGGCCTTTGTCCATTGAG 37069405 37069424 reverse 252GACAGAGGTGAGAGCCTAGGAG 37071540 37071561 PE14-15 E14-15_MLH1-v2 399forward 253 AATGTGTTGGGGAAGTGGTC 37081262 37081281 reverse 254TTTGGACCACGGCTTTAGAC 37084405 37084424 PE16-19 E16-18_MLH1-v2 400forward 255 AAGCTGAGGTCACGGATTTG 37087522 37087541 reverse 256GATGGGCAAGTTTCATCTCC 37090568 37090587 E19_MLH1-v2 401 forward 257TGGGACGAAGAAAAGGAATG 37090401 37090420 reverse 258 CACCGTGCCTCAGCCTATAC37093446 37093465 cPP1 cPP1a_MLH1-v2 402 forward 259GGACTAACCCACCTCCCTTC 37103239 37103258 reverse 260 GCTATAGGCAGCCCAGAGTG37106372 37106391 cPP2a_MLH1-v2 403 forward 261 GCCAGACTCTCGTTCCATTC37106390 37106409 reverse 262 AGGATTTGCCGTATGGACTC 37109450 37109469cPP3a_MLH1-v2 404 forward 263 TCGCCCAAAGTCACAGTAAG 37109303 37109322reverse 264 GATCTGTAGGCCCAGGATTTC 37112356 37112376 cPP4a_MLH1-v2 405forward 265 AGGGGTTTCTATGGCTGGTC 37112314 37112333 reverse 266CCTCCCTCAAACCTCCTCTC 37114423 37114442 cPP5a_MLH1-v2 406 forward 267TTCTCCTGCAGAGGAAGAGG 37114369 37114388 reverse 268 TTGGAATTTGTCCTGGTGTG37117519 37117538 cPP6a_MLH1-v2 407 forward 269 AAAGCCAGGGAGTGAATGG37117566 37117584 reverse 270 ATGTGCATCTCCCTGGTGAC 37120703 37120722cPP7a_MLH1-v2 408 forward 271 TGTGGGGAAATCAAAACCTG 37120784 37120803reverse 272 GGGTAGACTGTGCGTGTGTG 37123930 37123949

TABLE 2 MLH1-v2 MLH1-v1 MLH1 MSH2-V2 MSH2-V1 MSH2 probe probe regionprobe probe region sum length 86366 55582 121536 106534 73609 171394 bprepeat 44684 18525 64712 53243 22133 94584 bp length total repeat 51.7433.33 53.25 49.98 30.07 55.19 % SINE 24.93 2.58 23.85 34.68 5.03 35.95 %ALUs 22.38 0.09 21.85 32.85 0.76 34.15 %

REFERENCES

-   1. “Gene copy number variation and common human disease”, Fanciulli,    et. al. Clinical Genetics, 2010 77, 201-213-   2. “ Dynamic molecular combing : stretching the whole human genome    for high-resolution studies” Michalet, et al., Science 1997 277,    1518-1523 and “Bar code screening on combed DNA for large    rearrangemens of the BRCA1 and BRCA2 gene in French breast cancer    families”, Gad, et. al., J. Medical Genetics, 2002, 39, 817-821-   3. “Sequence-based design of single-copy genomic DNA probes for    fluorescence in situ hybridization” Rogan, et. al.,. Genome Res.    2001 11, 1086-94.-   4. “A dot-matrix program with dynamic threshold control suited for    genomic DNA and protein sequence analysis”. Erik L. L. Sonnhammer    and Richard Durbin. Gene 1995, 167:GC1-10-   5. “Microsatellite instability, mismatch repair deficiency and    genetic defects in human cancer cel lines”, Boyer J. C., et al.    Cancer Research 1995 55, 6063-6070,-   6. “Primer3Plus, an enhanced web interface to Primer3”, Untergasser    A., et al. Nucleic Acids Research 2007 35, W71-W74

1-48. (canceled)
 49. A kit comprising a set of short probes hybridizingspecifically on the MSH2 gene or on the MLH1 gene, and suitable for thedetection of rearrangements within said MSH2 gene or MLH1 gene, whereinat least one short probe comprises a label for detection and wherein,for each of detection, (i) the set of short probes comprises a set ofprobes that taken together hybridize to a continuous stretch of morethan 12 kb of the MSH2 gene or of the MLH1 gene; or (ii) the kit furthercomprises a set of long probes, wherein the long probes bind tosequences outside the MSH2 gene or the MLH1 gene and do not overlap theshort probe sequences, wherein the short probe sequence(s) specific ofthe MSH2 gene are obtained by amplification on human genomic DNA usingprimer pairs, wherein the primer pairs are selected from the groupconsisting of the sequences of SEQ ID NO: 21 and SEQ ID NO: 22, thesequences of SEQ ID NO: 23 and SEQ ID NO: 24, the sequences of SEQ IDNO: 25 and SEQ ID NO: 26, the sequences of SEQ ID NO: 27 and SEQ ID NO:28, the sequences of SEQ ID NO: 29 and SEQ ID NO: 30, the sequences ofSEQ ID NO: 31 and SEQ ID NO: 32, the sequences of SEQ ID NO: 33 and SEQID NO: 34, the sequences of SEQ ID NO: 35 and SEQ ID NO: 36, thesequences of SEQ ID NO: 37 and SEQ ID NO: 38, the sequences of SEQ IDNO: 39 and SEQ ID NO: 40, the sequences of SEQ ID NO: 41 and SEQ ID NO:42, the sequences of SEQ ID NO: 43 and SEQ ID NO: 44, the sequences ofSEQ ID NO: 45 and SEQ ID NO: 46, the sequences of SEQ ID NO: 47 and SEQID NO: 48, the sequences of SEQ ID NO: 49 and SEQ ID NO: 50, thesequences of SEQ ID NO: 51 and SEQ ID NO: 52, the sequences of SEQ IDNO: 53 and SEQ ID NO: 54, the sequences of SEQ ID NO: 55 and SEQ ID NO:56, the sequences of SEQ ID NO: 57 and SEQ ID NO: 58, the sequences ofSEQ ID NO: 59 and SEQ ID NO: 60, the sequences of SEQ ID NO: 163 and SEQID NO: 164, the sequences of SEQ ID NO: 165 and SEQ ID NO: 166, thesequences of SEQ ID NO: 167 and SEQ ID NO: 168, the sequences of SEQ IDNO: 169 and SEQ ID NO: 170, the sequences of SEQ ID NO: 171 and SEQ IDNO: 172, the sequences of SEQ ID NO: 185 and SEQ ID NO: 186, thesequences of SEQ ID NO: 187 and SEQ ID NO: 188, the sequences of SEQ IDNO: 189 and SEQ ID NO: 190, the sequences of SEQ ID NO: 191 and SEQ IDNO: 192, the sequences of SEQ ID NO: 193 and SEQ ID NO: 194, thesequences of SEQ ID NO: 195 and SEQ ID NO: 196, the sequences of SEQ IDNO: 197 and SEQ ID NO: 198, the sequences of SEQ ID NO: 199 and SEQ IDNO: 200, and the sequences of SEQ ID NO: 201 and SEQ ID NO: 202; andwherein the short probe sequence(s) specific of the MLH I gene areobtained by amplification on human genomic DNA using primer pairs,wherein the primer pairs are selected from the group consisting of thesequences of SEQ ID NO: 95 and SEQ ID NO: 96, the sequences of SEQ IDNO: 97 and SEQ ID NO: 98, the sequences of SEQ ID NO: 99 and SEQ ID NO:100, the sequences of SEQ ID NO: 101 and SEQ ID NO: 102, the sequencesof SEQ ID NO: 103 and SEQ ID NO: 104, the sequences of SEQ ID NO: 105and SEQ ID NO: 106, the sequences of SEQ ID NO: 107 and SEQ ID NO: 108,the sequences of SEQ ID NO: 109 and SEQ ID NO: 110, the sequences of SEQID NO: 111 and SEQ ID NO: 112, the sequences of SEQ ID NO: 113 and SEQID NO: 114, the sequences of SEQ ID NO: 115 and SEQ ID NO: 116, thesequences of SEQ ID NO: 117 and SEQ ID NO: 118, the sequences of SEQ IDNO: 119 and SEQ ID NO: 120, the sequences of SEQ ID NO: 121 and SEQ IDNO: 122, the sequences of SEQ ID NO: 227 and SEQ ID NO: 228, thesequences of SEQ ID NO: 229 and SEQ ID NO: 230, the sequences of SEQ IDNO: 231 and SEQ ID NO: 232, the sequences of SEQ ID NO: 233 and SEQ IDNO: 234, the sequences of SEQ ID NO: 235 and SEQ ID NO: 236, thesequences of SEQ ID NO: 237 and SEQ ID NO: 238, the sequences of SEQ IDNO: 239 and SEQ ID NO: 240, the sequences of SEQ ID NO: 241 and SEQ IDNO: 242, the sequences of SEQ ID NO: 243 and SEQ ID NO: 244, thesequences of SEQ ID NO: 245 and SEQ ID NO: 246, and the sequences of SEQID NO: 247 and SEQ ID NO: 248; and wherein the long probe sequence(s)specific of the MSH2 gene are obtained by amplification on human genomicDNA using primer pairs, wherein the primer pairs are selected from thegroup consisting of the sequences of SEQ ID NO: 61 and SEQ ID NO: 62,the sequences of SEQ ID NO: 63 and SEQ ID NO: 64, the sequences of SEQID NO: 65 and SEQ ID NO: 66, the sequences of SEQ ID NO: 67 and SEQ IDNO: 68, the sequences of SEQ ID NO: 69 and SEQ ID NO: 70, the sequencesof SEQ ID NO: 71 and SEQ ID NO: 72, the sequences of SEQ ID NO: 73 andSEQ ID NO: 74, and the sequences of SEQ ID NO: 75 and SEQ ID NO: 76; andwherein the long probe sequence(s) specific of the MLH I gene areobtained by amplification on human genomic DNA using primer pairs,wherein the primer pairs are selected from the group consisting of thesequences of SEQ ID NO: 123 and SEQ ID NO: 124, the sequences of SEQ IDNO: 125 and SEQ ID NO: 126, the sequences of SEQ ID NO: 127 and SEQ IDNO: 128, the sequences of SEQ ID NO: 129 and SEQ ID NO: 130, thesequences of SEQ ID NO: 131 and SEQ ID NO: 132, the sequences of SEQ IDNO: 133 and SEQ ID NO: 134, the sequences of SEQ ID NO: 135 and SEQ IDNO: 136, and the sequences of SEQ ID NO: 137 and SEQ ID NO:
 138. 50. Thekit according to claim 49 for the detection of genomic rearrangementsassociated with a condition selected from the group consisting of:colorectal cancer or genetic predisposition to colorectal cancer, breastcancer or genetic predisposition to breast cancer, ovarian cancer orgenetic predisposition to ovarian cancer, and lung cancer or geneticpredisposition to lung cancer.
 51. The kit according to claim 49,wherein the kit comprises a set of long probes, wherein the long probesbind to sequences outside the MSH2 gene or the MLH1 gene and do notoverlap the short probe sequences.
 52. The kit according to claim 49,wherein different components of the probe sets are tagged with differentlabels for detection.
 53. The kit according to claim 49, wherein the setof short probes comprises a set of probes that taken together hybridizeto a continuous stretch of more than 12 kb of the MSH2 gene or of theMLH1 gene and at least one short probe comprises a label for detection.54. The kit according to claim 49, wherein the short probe sequenceshybridize specifically on the MSH2 gene.
 55. The kit according to claim49, wherein the short probe sequences hybridize specifically on the MLH1gene.
 56. The kit according to claim 49, wherein the long probesequences hybridize specifically on the MSH2 gene.
 57. The kit accordingto claim 49, wherein the long probe sequences hybridize specifically onthe MLH1 gene.